Skip to main content

Full text of "Claude Shannon's Miscellaneous Writings"

See other formats

Claude  Elwood  Shannon 
Miscellaneous  Writings 

Edited  by 

N.  J.  A.  Sloane 
Aaron  D.  Wyner 

Back  in  1993,  the  late  Aaron  Wyner  and  I  edited  Claude  Elwood  Shannon's 
papers,  and  most  of  them  appeared  in  a  volume  (Claude  Elwood 
Shannon's  Collected  Papers)  which  was  published  by  the  IEEE  Press. 

However,  there  were  a  number  of  items  written  by  Shannon  of  lesser 
interest  which  we  did  not  include  (some  declassified  wartime  memoranda, 
obscure  AT&T  Bell  Labs  memos,  some  mimeographed  MIT  lecture  notes,  etc.). 

These  we  put  into  a  binder,  held  together  by  an  Acco  metal  strip. 

We  made  half  a  dozen  copies,  and  gave  copies  to  the  Library 

of  Congress,  the  British  Library,  the  Bell  Laboratories  Library, 

the  MIT  Library,  to  Claude  Shannon  himself,  and  to  one  or  two  other  places. 

Over  the  years  many  people  have  asked  me  if  it  was  possible  to  get  access 
to  this  collection. 

I  had  now  had  this  volume  scanned  and  converted  to  pdf  files. 
The  total  size  of  the  files  is  about  450  megabytes. 

Neil  J.  A.  Sloane,  October  13,  2013 

Mathematical  Sciences  Research  Center,  AT&T  Bell  Laboratories,  Murray  Hill, 
New  Jersey  07974 


File  1 :  Front  matter 

This  volume  contains  the  following  items.  Bracketed  numbers  refer  to  the  bibliography. 

"The  Use  of  the  Lakatos-Hickman  Relay  in  a  Subscriber  Sender,"  Memorandum 
MM  40-130-179,  August  3,  1940,  Bell  Laboratories,  7  pp.  +  8  figs. 

"A  Study  of  the  Deflection  Mechanism  and  Some  Results  on  Rate  Finders," 
Report  to  National  Defense  Research  Committee,  Div.  7-311 -Ml,  circa  April, 
1941,37  pp.  +  15  figs. 

"A  Height  Data  Smoothing  Mechanism,"  Report  to  National  Defense  Research 
Committee,  Div.  7-313.2-M1,  Princeton  Univ.,  May  26,  1941,  9  pp.  +  9  figs. 

"Some  Experimental  Results  on  the  Deflection  Mechanism,"  Report  to  National 
Defense  Research  Committee,  Div.  7-31 1-M1,  June  26,  1941,  11  pp. 

"Criteria  for  Consistency  and  Uniqueness  in  Relay  Circuits,"  Typescript,  Sept.  8, 
1941,5  pp.  +  3  figs. 

(With  W.  Feller)  "On  the  Integration  of  the  Ballistic  Equations  on  the  Aberdeen 
Analyzer,"  Applied  Mathematics  Panel  Report  No.  28.1,  National  Defense 
Research  Committee,  July  15,  1943,  9  pp. 

"Two  New  Circuits  for  Alternate  Pulse  Counting,"  Typescript,  May  29,  1944, 
Bell  Laboratories,  2  pp.  +  3  Figs. 

(Note  that  many  of  these  files  contain  more  than  one  document.) 






















































[20]  "Counting  Up  or  Down  With  Pulse  Counters,"  Typescript,  May  31,  1944,  Bell 
Laboratories,  1  p.  +  1  fig. 

[21]  (With  B.  M.  Oliver)  "Circuits  for  a  P.C.M.  Transmitter  and  Receiver," 
Memorandum  MM  44-1 10-37,  June  1,  1944,  Bell  Laboratories,  4  pp.,  1 1  figs. 

[23]  "Pulse  Shape  to  Minimize  Bandwidth  With  Nonoverlapping  Pulses,"  Typescript, 
August  4,  1944,  Bell  Laboratories,  4  pp. 

[24]  "A  Mathematical  Theory  of  Cryptography,"  Memorandum  MM  45-1 10-02,  Sept. 
1,  1945,  Bell  Laboratories,  1 14  pp.  +  25  figs. 

[26]  "Mixed  Statistical  Determinate  Systems,"  Typescript,  Sept.  19,  1945,  Bell 
Laboratories,  17  pp. 

[27]  (With  R.  B.  Blackman  and  H.  W.  Bode)  "Data  Smoothing  and  Prediction  in 
Fire-Control  Systems,"  Summary  Technical  Report,  Div.  7,  National  Defense 
Research  Committee,  Vol.  1,  Gunfire  Control,  Washington,  DC,  1946,  pp.  71-159 
and  166-167.  AD  200795.  Also  in  National  Military  Establishment  Research  and 
Development  Board,  Report  #13  MGC  12/1,  August  15,  1948.  Superseded  by 
[51]  and  by  R.  B.  Blackman,  Linear  Data-Smoothing  and  Prediction  in  Theory 
and  Practice,  Addison-Wesley,  Reading,  Mass.,  1965. 

[30]  (With  C.  L.  Dolph)  "The  Transient  Behavior  of  a  Large  Number  of  Four- 
Terminal  Unilateral  Linear  Networks  Connected  in  Tandem,"  Memorandum  MM 
46-1 10-49,  April  10,  1946,  Bell  Laboratories,  34  pp.  +  16  figs. 

[31]  "Electronic  Methods  in  Telephone  Switching,"  Typescript,  October  17,  1946, 
Bell  Laboratories,  5  pp.  +  1  fig. 

[32]  "Some  Generalizations  of  the  Sampling  Theorem,"  Typescript,  March  4,  1948,  5 
pp.  +  1  fig. 

[34]  "The  Normal  Ergodic  Ensembles  of  Functions,"  Typescript,  March  15,  1948,  5 

[35]  "Systems  Which  Approach  the  Ideal  as  P/N  — >  «>,"  Typescript,  March  15, 
1948,  2  pp. 

[36]     "Theorems  on  Statistical  Sequences,"  Typescript,  March  15,  1948,  8  pp. 

[45]  "Significance  and  Application  [of  Communication  Research],"  Symposium  on 
Communication  Research,  11-13  October,  1948,  Research  and  Development 
Board,  Department  of  Defense,  Washington,  DC,  pp.  14-23,  1948. 

[46]  "Note  on  Certain  Transcendental  Numbers,"  Typescript,  October  27,  1948,  Bell 
Laboratories,  1  p. 

[47]  "A  Case  of  Efficient  Coding  for  a  Very  Noisy  Channel,"  Typescript,  Nov.  18, 
1948,  Bell  Laboratories,  2  pp. 

[48]  "Note  on  Reversing  a  Discrete  Markhoff  Process,"  Typescript,  Dec.  6  1948,  Bell 
Laboratories,  2  pp.  +  2  Figs. 


































File  104 

[49]     "Information  Theory,"  Typescript  of  abstract  of  talk  for  American  Statistical 
Society,  1949,  5  pp. 

[58]     "Proof  of  an  Integration  Formula,'*  Typescript,  circa  1950,  Bell  Laboratories,  2 

[59]     "A  Digital  Method  of  Transmitting  Information,"  Typescript,  no  date,  circa 
1950,  Bell  Laboratories,  3  pp. 

[72]     *  'Creative  Thinking,' '  Typescript,  March  20,  1952,  Bell  Laboratories,  10  pp. 

[74]     (With  E.  F.  Moore)  "The  Relay  Circuit  Analyzer,*'  Memorandum  MM  53-1400- 
9,  March  31,  1953,  Bell  Laboratories,  14  pp.  +  4  figs. 

[77]     "Throbac  -  Circuit  Operation,"  Typescript,  April  9,  1953,  Bell  Laboratories,  7 

[78]     '  'Tower  of  Hanoi,' '  Typescript,  April  20,  1953,  Bell  Laboratories,  4  pp. 

[81]     "Mathmanship  or  How  to  Give  an  Explicit  Solution  Without  Actually  Solving 
the  Problem,"  Typescript,  June  3,  1953,  Bell  Laboratories,  2  pp. 

[84]     (With  E.  F.  Moore)  "The  Relay  Circuit  Synthesizer,"  Memorandum  MM  53- 
140-52,  November  30,  1953,  Bell  Laboratories,  22  pp.  +  5  figs. 

[87]     "Bounds  on  the  Derivatives  and  Rise  Time  of  a  Band  and  Amplitude  Limited 
Signal,"  Typescript,  April  8,  1954,  Bell  Laboratories,  6  pp.  +  1  Fig. 

[95]     "Concavity  of  Transmission  Rate  as  a  Function  of  Input  Probabilities," 
Memorandum  MM  55-1 14-28,  June  8,  1955,  Bell  Laboratories. 

[104]    "Information  Theory,"  Seminar  Notes,  Massachusetts  Institute  of  Technology, 
1956  and  succeeding  years.  Contains  the  following  sections: 

"A  skeleton  key  to  the  information  theory  notes,"  3  pp.  "Bounds  on  the  tails  of 
martingales  and  related  questions,"  19  pp.  "Some  useful  inequalities  for 
distribution  functions,"  3  pp.  "A  lower  bound  on  the  tail  of  a  distribution,"  9 
pp.  "A  combinatorial  theorem,"  1  p.  "Some  results  on  determinants,"  3  pp. 
"Upper  and  lower  bounds  for  powers  of  a  matrix  with  non-negative  elements,"  3 
pp.  "The  number  of  sequences  of  a  given  length,"  3  pp.  "Characteristic  for  a 
language  with  independent  letters/'  4  pp.  "The  probability  of  error  in  optimal 
codes,"  5  pp.  "Zero  error  codes  and  the  zero  error  capacity  Co,"  10  pp. 
"Lower  bound  for  Pef  for  a  completely  connected  channel  with  feedback,"  1  p. 
"A  lower  bound  for  P€  when  R  >  C,"  2  pp.  "A  lower  bound  for  Pe,"  2  pp. 
"Lower  bound  with  one  type  of  input  and  many  types  of  output,"  3  pp. 
"Application  of  'sphere-packing'  bounds  to  feedback  case,"  8  pp.  "A  result  for 
the  memory  less  feedback  channel,"  1  p.  "Continuity  of  Pe  opt  as  a  function  of 
transition  probabilities,"  1  p.  "Codes  of  a  fixed  composition,"  1  p.  "Relation  of 
Pe  to  p,"  2  pp.  "Bound  on  Pe  for  random  ode  by  simple  threshold  argument,"  4 
pp.  "A  bound  on  P€  for  a  random  code,"  3  pp.  "The  Feinstein  bound,"  2  pp. 
"Relations  between  probability   and  minimum  word  separation,"   4  pp. 

File  104 









■  [107] 









;  [127] 

"Inequalities  for  decodable  codes,"  3  pp.  "Convexity  of  channel  capacity  as  a 
function  of  transition  probabilities,"  1  pp.  "A  geometric  interpretation  of 
channel  capacity,"  6  pp,  "Log  moment  generating  function  for  the  square  of  a 
Gaussian  variate,"  2  pp.  "Upper  bound  on  Pe  for  Gaussian  channel  by 
expurgated  random  code,"  2  pp.  "Lower  bound  on  Pe  in  Gaussian  channel  by 
minimum  distance  argument,"  2  pp,  "The  sphere  packing  bound  for  the 
Gaussian  power  limited  channel,"  4  pp.  "The  r-terminal  channel,"  7  pp. 
"Conditions  for  constant  mutual  information,"  2  pp,  "The  central  limit  theorem 
with  large  deviations,"  6  pp.  "The  Chemoff  inequality,"  2  pp.  "Upper  and 
lower  bounds  on  the  tails  of  distributions,"  4  pp.  "Asymptotic  behavior  of  the 
distribution  function,"  5  pp.  "Generalized  Chebyeheff  and  Chernoff 
inequalities,"  I  p.  "Channels  with  side  information  at  the  transmitter,"  13  pp. 
"Some  miscellaneous  results  in  coding  theory,"  15  pp.  "Error  probability 
bounds  for  noisy  channels,"  20  pp. 

"Reliable  Machines  from  Unreliable  Components,"  notes  of  five  lectures, 
Massachusetts  Institute  of  Technology,  Spring  1956,  24  pp. 

"The  Portfolio  Problem,  and  How  to  Pay  the  Forecaster,"  lecture  notes  taken  by 
W.  W,  Peterson,  Massachusetts  Institute  of  Technology,  Spring,  1956,  8  pp. 

"Notes  on  Relation  of  Error  Probability  to  Delay  in  a  Noisy  Channel,"  notes  of  a 
lecture,  Massachusetts  Institute  of  Technology,  Aug.  30,  1956,  3  pp. 

"Notes  on  the  Kelly  Betting  Theory  of  Noisy  Information,"  notes  of  a  lecture, 
Massachusetts  Institute  of  Technology,  Aug.  31,  1956,  2  pp. 

"The  Fourth- Dimensional  Twist,  or  a  Modest  Proposal  in  Aid  of  the  American 
Driver  in  England,"  typescript,  All  Souls  College,  Oxford,  Trinity  term,  1978,  7 
pp.  +  8  figs. 

"A  Rubric  on  Rubik  Cubics,"  Typescript,  circa  1982,  6  pp. 

Claude  Elwood  Shannon 
Miscellaneous  Writings 

Edited  by 

N.  J.  A.  Sloane 
Aaron  D.  Wyner 

Mathematical  Sciences  Research  Center,  AT&T  Bell  Laboratories,  Murray  Hill, 
New  Jersey  07974 


This  volume  contains  all  of  Claude  Elwood  Shannon's  writings  that  we  did  not  include  in 
his  Collected  Papers.  * 

*    Claude  Elwood  Shannon:  Collected  Papers,  edited  by  N.  J.  A.  Sloane  and  A.  D.  Wyner,  IEEE  Press, 
New  York,  1993,  xliv  +  924  pp.  ISBN  0-7803-0434-9. 


Photograph  of  Claude  Shannon  at  Bell  Labs  in  May  1952.  Caption:  "In  1952,  Claude  E. 
Shannon  of  Bell  Laboratories  devised  an  experiment  to  illustrate  the  capabilities  of 
telephone  relays.  Here,  an  electrical  mouse  finds  its  way  unerringly  through  a  maze, 
guided  by  information  remembered  in  the  kind  of  switching  relays  used  in  dial  telephone 
systems.  Experiments  with  the  mouse  helped  stimulate  Bell  Laboratories  researchers  to 
think  of  new  ways  to  use  the  logical  powers  of  computers  for  operations  other  than 
numerical  calculation." 

Photograph  of  Claude  Shannon  and  Dave  Hagelbarger  at  Bell  Labs  in  March  1955. 
Caption:  "Claude  Shannon,  the  originator  of  Information  Theory,  at  the  board  and  Dave 
Hagelbarger  work  out  some  equations  needed.  Their  current  projects  include  work  on 
automata-advanced  type  of  computing  machines  which  are  able  to  perform  various 
thought  functions. 

Photograph  of  Claude  Shannon  taken  in  1980's.  Photographer  unknown. 

Bibliography  of  Claude  Elwood  Shannon.  Comments  such  as  "Included  in  Part  B"  refer 
to  Parts  A,  B,  C,  D  of  the  Collected  Papers  mentioned  in  the  Preface. 

This  volume  contains  the  following  items.  Bracketed  numbers  refer  to  the  bibliography. 

[5]  4 The  Use  of  the  Lakatos-Hickman  Relay  in  a  Subscriber  Sender,"  Memorandum 
MM  40-130-179,  August  3,  1940,  Bell  Laboratories,  7  pp.  +  8  figs. 

[7]  "A  Study  of  the  Deflection  Mechanism  and  Some  Results  on  Rate  Finders," 
Report  to  National  Defense  Research  Committee,  Div.  7-31 1-M1,  circa  April, 
1941,37  pp.  +  15  figs. 

[9]  "A  Height  Data  Smoothing  Mechanism,"  Report  to  National  Defense  Research 
Committee,  Div.  7-313.2-M1,  Princeton  Univ.,  May  26,  1941,  9  pp.  +  9  figs. 

[11]  "Some  Experimental  Results  on  the  Deflection  Mechanism,"  Report  to  National 
Defense  Research  Committee,  Div.  7-31 1 -Ml,  June  26,  1941,  1 1  pp. 

[12]  "Criteria  for  Consistency  and  Uniqueness  in  Relay  Circuits,"  Typescript,  Sept.  8, 
1941,5  pp. +  3  figs. 

[16]  (With  W.  Feller)  "On  the  Integration  of  the  Ballistic  Equations  on  the  Aberdeen 
Analyzer,"  Applied  Mathematics  Panel  Report  No.  28.1,  National  Defense 
Research  Committee,  July  15,  1943,  9  pp. 

[19]  "Two  New  Circuits  for  Alternate  Pulse  Counting,"  Typescript,  May  29,  1944, 
Bell  Laboratories,  2  pp.  +  3  Figs. 


[20]  "Counting  Up  or  Down  With  Pulse  Counters,"  Typescript,  May  31,  1944,  Bell 
Laboratories,  1  p.  +  1  fig. 

[21]  (With  B.  M.  Oliver)  "Circuits  for  a  P.C.M.  Transmitter  and  Receiver," 
Memorandum  MM  44-1 10-37,  June  1,  1944,  Bell  Laboratories,  4  pp.,  1 1  figs. 

[23]  "Pulse  Shape  to  Minimize  Bandwidth  With  Nonoverlapping  Pulses,"  Typescript, 
August  4,  1944,  Bell  Laboratories,  4  pp. 

[24]  "A  Mathematical  Theory  of  Cryptography,"  Memorandum  MM  45-1 10-02,  Sept. 
1,  1945,  Bell  Laboratories,  1 14  pp.  +  25  figs. 

[26]  "Mixed  Statistical  Determinate  Systems,"  Typescript,  Sept.  19,  1945,  Bell 
Laboratories,  17  pp. 

[27]  (With  R.  B.  Blackman  and  H.  W.  Bode)  "Data  Smoothing  and  Prediction  in 
Fire-Control  Systems,"  Summary  Technical  Report,  Div.  7,  National  Defense 
Research  Committee,  Vol.  1,  Gunfire  Control,  Washington,  DC,  1946,  pp.  71-159 
and  166-167.  AD  200795.  Also  in  National  Military  Establishment  Research  and 
Development  Board,  Report  #13  MGC  12/1,  August  15,  1948.  Superseded  by 
[51]  and  by  R.  B.  Blackman,  Linear  Data-Smoothing  and  Prediction  in  Theory 
and  Practice,  Addison- Wesley,  Reading,  Mass.,  1965. 

[30]  (With  C.  L.  Dolph)  "The  Transient  Behavior  of  a  Large  Number  of  Four- 
Terminal  Unilateral  Linear  Networks  Connected  in  Tandem,"  Memorandum  MM 
46-1 10-49,  April  10,  1946,  Bell  Laboratories,  34  pp.  +  16  figs. 

[31]  "Electronic  Methods  in  Telephone  Switching,"  Typescript,  October  17,  1946, 
Bell  Laboratories,  5  pp.  +  1  fig. 

[32]  "Some  Generalizations  of  the  Sampling  Theorem,"  Typescript,  March  4,  1948,  5 
pp.  +  1  fig. 

[34]  "The  Normal  Ergodic  Ensembles  of  Functions,"  Typescript,  March  15,  1948,  5 

[35]  "Systems  Which  Approach  the  Ideal  as  P/N  ->  <»,"  Typescript,  March  15, 
1948,  2  pp. 

[36]     "Theorems  on  Statistical  Sequences,"  Typescript,  March  15,  1948,  8  pp. 

[45]  "Significance  and  Application  [of  Communication  Research],"  Symposium  on 
Communication  Research,  11-13  October,  1948,  Research  and  Development 
Board,  Department  of  Defense,  Washington,  DC,  pp.  14-23,  1948. 

[46]  "Note  on  Certain  Transcendental  Numbers,"  Typescript,  October  27,  1948,  Bell 
Laboratories,  1  p. 

[47]  "A  Case  of  Efficient  Coding  for  a  Very  Noisy  Channel,"  Typescript,  Nov.  18, 
1948,  Bell  Laboratories,  2  pp. 

[48]  "Note  on  Reversing  a  Discrete  Markhoff  Process,"  Typescript,  Dec.  6  1948,  Bell 
Laboratories,  2  pp.  +  2  Figs. 


[49]     "Information  Theory,"  Typescript  of  abstract  of  talk  for  American  Statistical 
Society,  1949,  5  pp. 

[58]     "Proof  of  an  Integration  Formula,"  Typescript,  circa  1950,  Bell  Laboratories,  2 

[59]     "A  Digital  Method  of  Transmitting  Information,"  Typescript,  no  date,  circa 
1950,  Bell  Laboratories,  3  pp. 

[72]     '  'Creative  Thinking,"  Typescript,  March  20,  1952,  Bell  Laboratories,  10  pp. 

[74]     (With  E.  F.  Moore)  "The  Relay  Circuit  Analyzer,"  Memorandum  MM  53-1400- 
9,  March  31,  1953,  Bell  Laboratories,  14  pp.  +  4  figs. 

[77]     "Throbac  -  Circuit  Operation,"  Typescript,  April  9,  1953,  Bell  Laboratories,  7 

[78]     "Tower  of  Hanoi,"  Typescript,  April  20,  1953,  Bell  Laboratories,  4  pp. 

[81]     "Mathmanship  or  How  to  Give  an  Explicit  Solution  Without  Actually  Solving 
the  Problem,"  Typescript,  June  3,  1953,  Bell  Laboratories,  2  pp. 

[84]     (With  E.  F.  Moore)  "The  Relay  Circuit  Synthesizer,"  Memorandum  MM  53- 
140-52,  November  30,  1953,  Bell  Laboratories,  22  pp.  +  5  figs. 

[87]     "Bounds  on  the  Derivatives  and  Rise  Time  of  a  Band  and  Amplitude  Limited 
Signal,"  Typescript,  April  8,  1954,  Bell  Laboratories,  6  pp.  +  1  Fig. 

[95]     "Concavity  of  Transmission  Rate  as  a  Function  of  Input  Probabilities," 
Memorandum  MM  55-1 14-28,  June  8,  1955,  Bell  Laboratories. 

[104]    "Information  Theory,"  Seminar  Notes,  Massachusetts  Institute  of  Technology, 
1956  and  succeeding  years.  Contains  the  following  sections: 

"A  skeleton  key  to  the  information  theory  notes,"  3  pp.  "Bounds  on  the  tails  of 
martingales  and  related  questions,"  19  pp.  "Some  useful  inequalities  for 
distribution  functions,"  3  pp.  "A  lower  bound  on  the  tail  of  a  distribution,"  9 
pp.  "A  combinatorial  theorem,"  1  p.  "Some  results  on  determinants,"  3  pp. 
"Upper  and  lower  bounds  for  powers  of  a  matrix  with  non-negative  elements,"  3 
pp.  "The  number  of  sequences  of  a  given  length,"  3  pp.  "Characteristic  for  a 
language  with  independent  letters,"  4  pp.  "The  probability  of  error  in  optimal 
codes,"  5  pp.  "Zero  error  codes  and  the  zero  error  capacity  C0,"  10  pp. 
"Lower  bound  for  Pej  for  a  completely  connected  channel  with  feedback,"  1  p. 
"A  lower  bound  for  Pe  when  R  >  C,"  2  pp.  "A  lower  bound  for  Pe"  2  pp. 
"Lower  bound  with  one  type  of  input  and  many  types  of  output,"  3  pp. 
"Application  of  'sphere-packing'  bounds  to  feedback  case,"  8  pp.  "A  result  for 
the  memoryless  feedback  channel,"  1  p.  "Continuity  of  Pe  opt  as  a  function  of 
transition  probabilities,"  1  p.  "Codes  of  a  fixed  composition,"  1  p.  "Relation  of 
Pe  to  p,"  2  pp.  "Bound  on  Pe  for  random  ode  by  simple  threshold  argument,"  4 
pp.  "A  bound  on  Pe  for  a  random  code,"  3  pp.  "The  Feinstein  bound,"  2  pp. 
"Relations  between  probability   and  minimum  word   separation,"   4  pp. 


"Inequalities  for  decodable  codes,"  3  pp.  "Convexity  of  channel  capacity  as  a 
function  of  transition  probabilities,"  1  pp.  "A  geometric  interpretation  of 
channel  capacity,"  6  pp.  "Log  moment  generating  function  for  the  square  of  a 
Gaussian  variate,"  2  pp.  "Upper  bound  on  Pe  for  Gaussian  channel  by 
expurgated  random  code,"  2  pp.  "Lower  bound  on  Pe  in  Gaussian  channel  by 
minimum  distance  argument,"  2  pp.  "The  sphere  packing  bound  for  the 
Gaussian  power  limited  channel,"  4  pp.  "The  jT-terminal  channel,"  7  pp. 
"Conditions  for  constant  mutual  information,"  2  pp.  "The  central  limit  theorem 
with  large  deviations,"  6  pp.  "The  Chernoff  inequality,"  2  pp.  "Upper  and 
lower  bounds  on  the  tails  of  distributions,"  4  pp.  "Asymptotic  behavior  of  the 
distribution  function,"  5  pp.  "Generalized  Chebycheff  and  Chernoff 
inequalities,"  1  p.  "Channels  with  side  information  at  the  transmitter,"  13  pp. 
"Some  miscellaneous  results  in  coding  theory,"  15  pp.  "Error  probability 
bounds  for  noisy  channels,"  20  pp. 

[105]  "Reliable  Machines  from  Unreliable  Components,"  notes  of  five  lectures, 
Massachusetts  Institute  of  Technology,  Spring  1956,  24  pp. 

[106]  "The  Portfolio  Problem,  and  How  to  Pay  the  Forecaster,"  lecture  notes  taken  by 
W.  W.  Peterson,  Massachusetts  Institute  of  Technology,  Spring,  1956,  8  pp. 

[107]  "Notes  on  Relation  of  Error  Probability  to  Delay  in  a  Noisy  Channel,"  notes  of  a 
lecture,  Massachusetts  Institute  of  Technology,  Aug.  30,  1956,  3  pp. 

[108]  "Notes  on  the  Kelly  Betting  Theory  of  Noisy  Information,"  notes  of  a  lecture, 
Massachusetts  Institute  of  Technology,  Aug.  31,  1956,  2  pp. 

[124]  "The  Fourth-Dimensional  Twist,  or  a  Modest  Proposal  in  Aid  of  the  American 
Driver  in  England,"  typescript,  All  Souls  College,  Oxford,  Trinity  term,  1978,  7 
pp.  +  8  figs. 

[127]    "A  Rubric  on  Rubik  Cubics,"  Typescript,  circa  1982,  6  pp. 

Bibliography  of  Claude  Elwood  Shannon 

"A  Symbolic  Analysis  of  Relay  and  Switching  Circuits,"  Transactions 
American  Institute  of  Electrical  Engineers,  Vol.  57  (1938),  pp.  713-723. 
(Received  March  1,  1938.)  Included  in  Part  B. 

Letter  to  Vannevar  Bush,  Feb.  16,  1939.  Printed  in  F.-W.  Hagemeyer, 
Die  Entstehung  von  Informationskonzepten  in  der  Nachrichtentechnik: 
eine  Fallstudie  zur  Theoriebildung  in  der  Technik  in  Industrie-  und 
Kriegsforschung  [The  Origin  of  Information  Theory  Concepts  in 
Communication  Technology:  Case  Study  for  Engineering  Theory- 
Building  in  Industrial  and  Military  Research],  Doctoral  Dissertation, 
Free  Univ.  Berlin,  Nov.  8,  1979,  570  pp.  Included  in  Part  A. 

"An  Algebra  for  Theoretical  Genetics,"  Ph.D.  Dissertation,  Department 
of  Mathematics,  Massachusetts  Institute  of  Technology,  April  15,  1940, 
69  pp.  Included  in  Part  C. 

"A  Theorem  on  Color  Coding,"  Memorandum  40-130-153,  July  8, 
1940,  Bell  Laboratories.  Superseded  by  "A  Theorem  on  Coloring  the 
Lines  of  a  Network. ' '  Not  included. 

"The  Use  of  the  Lakatos-Hickman  Relay  in  a  Subscriber  Sender," 
Memorandum  MM  40-130-179,  August  3,  1940,  Bell  Laboratories,  7  pp. 

"A  Study  of  the  Deflection  Mechanism  and  Some  Results  on  Rate 
Finders,"  Report  to  National  Defense  Research  Committee,  Div.  7-311- 
Ml,  circa  April,  1941,  37  pp.  +  15  figs.  Included  in  this  volume. 

"Backlash  in  Overdamped  Systems,"  Report  to  National  Defense 
Research  Committee,  Princeton  Univ.,  May  14,  1941,  6  pp.  Abstract 
only  included  in  Part  B. 

"A  Height  Data  Smoothing  Mechanism,"  Report  to  National  Defense 
Research  Committee,  Div.  7-313.2-M1,  Princeton  Univ.,  May  26,  1941, 
9  pp.  +  9  figs.  Included  in  this  volume. 

"The  Theory  of  Linear  Differential  and  Smoothing  Operators,"  Report 
to  National  Defense  Research  Committee,  Div.  7-3 13.1 -Ml,  Princeton 
Univ.,  June  8,  1941,  1 1  pp.  Not  included. 

"Some  Experimental  Results  on  the  Deflection  Mechanism,"  Report  to 
National  Defense  Research  Committee,  Div.  7-3 11 -Ml,  June  26,  1941, 
1 1  pp.  Included  in  this  volume. 


[12]  "Criteria  for  Consistency  and  Uniqueness  in  Relay  Circuits," 
Typescript,  Sept.  8,  1941,  5  pp.  +  3  figs.  Included  in  this  volume. 

[13]  "The  Theory  and  Design  of  Linear  Differential  Equation  Machines," 
Report  to  the  Services  20,  Div.  7-31 1-M2,  Jan.  1942,  Bell  Laboratories, 
73  pp.  +  30  figs.  Included  in  Part  B. 

[14]  (With  John  Riordan)  "The  Number  of  Two-Terminal  Series-Parallel 
Networks,"  Journal  of  Mathematics  and  Physics,  Vol.  21  (August, 
1942),  pp.  83-93.  Included  in  Part  B. 

[15]  "Analogue  of  the  Vernam  System  for  Continuous  Time  Series," 
Memorandum  MM  43-110-44,  May  10,  1943,  Bell  Laboratories,  4  pp.  + 
4  figs.  Included  in  Part  A. 

[16]  (With  W.  Feller)  "On  the  Integration  of  the  Ballistic  Equations  on  the 
Aberdeen  Analyzer,"  Applied  Mathematics  Panel  Report  No.  28.1, 
National  Defense  Research  Committee,  July  15,  1943,  9  pp.  Included  in 
this  volume. 

[17]    "Pulse  Code  Modulation,"  Memorandum  MM  43-110-43,  December  1, 

1943,  Bell  Laboratories.  Not  included. 

[18]  "Feedback  Systems  with  Periodic  Loop  Closure,"  Memorandum  MM 
44-1 10-32,  March  16,  1944,  Bell  Laboratories.  Not  included. 

[19]    "Two  New  Circuits  for  Alternate  Pulse  Counting,"  Typescript,  May  29, 

1944,  Bell  Laboratories,  2  pp.  +  3  Figs.  Included  in  this  volume. 

[20]  "Counting  Up  or  Down  With  Pulse  Counters,"  Typescript,  May  31, 
1944,  Bell  Laboratories,  1  p.  +  1  fig.  Included  in  this  volume. 

[21]  (With  B.  M.  Oliver)  "Circuits  for  a  P.C.M.  Transmitter  and  Receiver," 
Memorandum  MM  44-1 10-37,  June  1,  1944,  Bell  Laboratories,  4  pp.,  1 1 
figs.  Included  in  this  volume. 

[22]  "The  Best  Detection  of  Pulses,"  Memorandum  MM  44-1 10-28,  June  22, 
1944,  Bell  Laboratories,  3  pp.  Included  in  Part  A. 

[23]  "Pulse  Shape  to  Minimize  Bandwidth  With  Nonoverlapping  Pulses," 
Typescript,  August  4,  1944,  Bell  Laboratories,  4  pp.  Included  in  this 

[24]  "A  Mathematical  Theory  of  Cryptography,"  Memorandum  MM  45- 
110-02,  Sept.  1,  1945,  Bell  Laboratories,  114  pp.  +  25  figs.  Superseded 
by  the  following  paper.  Included  in  this  volume. 

[25]  "Communication  Theory  of  Secrecy  Systems,"  Bell  System  Technical 
Journal,  Vol.  28  (1949),  pp.  656-715.  "The  material  in  this  paper 
appeared  originally  in  a  confidential  report  'A  Mathematical  Theory  of 
Cryptography',  dated  Sept.  1,  1945,  which  has  now  been  declassified." 
Included  in  Part  A. 


[26]  "Mixed  Statistical  Determinate  Systems,"  Typescript,  Sept.  19,  1945, 
Bell  Laboratories,  17  pp.  Included  in  this  volume. 

[27]  (With  R.  B.  Blackman  and  H.  W.  Bode)  "Data  Smoothing  and 
Prediction  in  Fire-Control  Systems,"  Summary  Technical  Report, 
Div.  7,  National  Defense  Research  Committee,  Vol.  1 ,  Gunfire  Control, 
Washington,  DC,  1946,  pp.  71-159  and  166-167.  AD  200795.  Also  in 
National  Military  Establishment  Research  and  Development  Board, 
Report  #13  MGC  12/1,  August  15,  1948.  Superseded  by  [51]  and  by  R. 
B.  Blackman,  Linear  Data-Smoothing  and  Prediction  in  Theory  and 
Practice,  Addison-Wesley,  Reading,  Mass.,  1965.  Included  in  this 

[28]  (With  B.  M.  Oliver)  "Communication  System  Employing  Pulse  Code 
Modulation,"  Patent  2,801,281.  Filed  Feb.  21,  1946,  granted  July  30, 
1957.  Not  included. 

[29]  (With  B.  D.  Holbrook)  "A  Sender  Circuit  For  Panel  or  Crossbar 
Telephone  Systems,"  Patent  application  circa  1946,  application  dropped 
April  13,  1948.  Not  included. 

[30]  (With  C.  L.  Dolph)  "The  Transient  Behavior  of  a  Large  Number  of 
Four-Terminal  Unilateral  Linear  Networks  Connected  in  Tandem," 
Memorandum  MM  46-110-49,  April  10,  1946,  Bell  Laboratories,  34  pp. 
+  16  figs.  Included  in  this  volume. 

[31]  "Electronic  Methods  in  Telephone  Switching,"  Typescript,  October  17, 
1946,  Bell  Laboratories,  5  pp.  +  1  fig.  Included  in  this  volume. 

[32]  "Some  Generalizations  of  the  Sampling  Theorem,"  Typescript,  March 
4,  1948,  5  pp.  +  1  fig.  Included  in  this  volume. 

[33]  (With  J.  R.  Pierce  and  J.  W.  Tukey)  "Cathode-Ray  Device,"  Patent 
2,576,040.  Filed  March  10,  1948,  granted  Nov.  20,  1951.  Not  included. 

[34]  "The  Normal  Ergodic  Ensembles  of  Functions,"  Typescript,  March  15, 
1948,  5  pp.  Included  in  this  volume. 

[35]  "Systems  Which  Approach  the  Ideal  as  P/N  ->  oo,"  Typescript,  March 
15,  1948,  2  pp.  Included  in  this  volume. 

[36]  "Theorems  on  Statistical  Sequences,"  Typescript,  March  15,  1948,  8  pp. 
Included  in  this  volume. 

[37]  "A  Mathematical  Theory  of  Communication,"  Bell  System  Technical 
Journal,  Vol.  27  (July  and  October  1948),  pp.  379-423  and  623-656. 
Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the  Development  of 
Information  Theory,  IEEE  Press,  NY,  1974.  Included  in  Part  A. 

[38]  (With  Warren  Weaver)  The  Mathematical  Theory  of  Communication, 
University  of  Illinois  Press,  Urbana,  JL,  1949,  vi  +  1 17  pp.  Reprinted 
(and  repaginated)  1963.  The  section  by  Shannon  is  essentially  identical 

to  the  previous  item.  Not  included. 

[39]  (With  Warren  Weaver)  Mathematische  Grundlagen  der 
Informationstheorie,  Scientia  Nova,  Oldenbourg  Verlag,  Munich,  1976, 
pp.  143.  German  translation  of  the  preceding  book.  Not  included. 

[40]  (With  B.  M.  Oliver  and  J.  R.  Pierce)  "The  Philosophy  of  PCM," 
Proceedings  Institute  of  Radio  Engineers,  Vol.  36  (1948),  pp.  1324- 
1331.  (Received  May  24,  1948.)  Included  in  Part  A. 

[41]  "Samples  of  Statistical  English,"  Typescript,  June  11,  1948,  Bell 
Laboratories,  3  pp.  Included  in  this  volume. 

[42]  "Network  Rings,"  Typescript,  June  11,  1948,  Bell  Laboratories,  26  pp. 
+  4  figs.  Included  in  Part  B. 

[43]  "Communication  in  the  Presence  of  Noise,"  Proceedings  Institute  of 
Radio  Engineers,  Vol.  37  (1949),  pp.  10-21.  (Received  July  23,  1940 
[1948?].)  Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the 
Development  of  Information  Theory,  IEEE  Press,  NY,  1974.  Reprinted 
in  Proceedings  Institute  of  Electrical  and  Electronic  Engineers,  Vol.  72 
(1984),  pp.  1192-1201.  Included  in  Part  A. 

[44]  "A  Theorem  on  Coloring  the  Lines  of  a  Network,"  Journal  of 
Mathematics  and  Physics,  Vol.  28  (1949),  pp.  148-151.  (Received  Sept. 
14,  1948.)  Included  in  Part  B. 

[45]  "Significance  and  Application  [of  Communication  Research]," 
Symposium  on  Communication  Research,  11-13  October,  1948,  Research 
and  Development  Board,  Department  of  Defense,  Washington,  DC,  pp. 
14-23,  1948.  Included  in  this  volume. 

[46]  "Note  on  Certain  Transcendental  Numbers,"  Typescript,  October  27, 
1948,  Bell  Laboratories,  1  p.  Included  in  this  volume. 

[47]  "A  Case  of  Efficient  Coding  for  a  Very  Noisy  Channel,"  Typescript, 
Nov.  18,  1948,  Bell  Laboratories,  2  pp.  Included  in  this  volume. 

[48]  "Note  on  Reversing  a  Discrete  Markhoff  Process,"  Typescript,  Dec.  6 
1948,  Bell  Laboratories,  2  pp.  +  2  Figs.  Included  in  this  volume. 

[49]  "Information  Theory,"  Typescript  of  abstract  of  talk  for  American 
Statistical  Society,  1949,  5  pp.  Included  in  this  volume. 

[50]  "The  Synthesis  of  Two-Terminal  Switching  Circuits,"  Bell  System 
Technical  Journal,  Vol.  28  (Jan.,  1949),  pp.  59-98.  Included  in  Part  B. 

[51]  (With  H.  W.  Bode)  "A  Simplified  Derivation  of  Linear  Least  Squares 
Smoothing  and  Prediction  Theory,"  Proceedings  Institute  of  Radio 
Engineers,  Vol.  38  (1950),  pp.  417-425.  (Received  July  13,  1949.) 
Included  in  Part  B. 


[52]  "Review  of  Transformations  on  Lattices  and  Structures  of  Logic  by 
Stephen  A.  Kiss,"  Proceedings  Institute  of  Radio  Engineers,  Vol.  37 
(1949),  p.  1 163.  Included  in  Part  B. 

[53]  "Review  of  Cybernetics,  or  Control  and  Communication  in  the  Animal 
and  the  Machine  by  Norbert  Wiener,"  Proceedings  Institute  of  Radio 
Engineers,  Vol.  37  (1949),  p.  1305.  Included  in  Part  B. 

[54]  "Programming  a  Computer  for  Playing  Chess,"  Philosophical 
Magazine,  Series  7,  Vol.  41  (No.  314,  March  1950),  pp.  256-275. 
(Received  Nov.  8,  1949.)  Reprinted  in  D.  N.  L.  Levy,  editor,  Computer 
Chess  Compendium,  Springer- Verlag,  NY,  1988.  Included  in  Part  B. 

[55]  "A  Chess-Playing  Machine,"  Scientific  American,  Vol.  182  (No.  2, 
February  1950),  pp.  48-51.  Reprinted  in  The  World  of  Mathematics, 
edited  by  James  R.  Newman,  Simon  and  Schuster,  NY,  Vol.  4,  1956,  pp. 
2124-2133.  Included  in  Part  B. 

[56]  "Memory  Requirements  in  a  Telephone  Exchange,"  Bell  System 
Technical  Journal,  Vol.  29  (1950),  pp.  343-349.   (Received  Dec.  7, 

1949.  )  Included  in  Part  B. 

[57]  "A  Symmetrical  Notation  for  Numbers,"  American  Mathematical 
Monthly,  Vol.  57  (Feb.,  1950),  pp.  90-93.  Included  in  Part  B. 

[58]  "Proof  of  an  Integration  Formula,"  Typescript,  circa  1950,  Bell 
Laboratories,  2  pp.  Included  in  this  volume. 

[59]  "A  Digital  Method  of  Transmitting  Information,"  Typescript,  no  date, 
circa  1950,  Bell  Laboratories,  3  pp.  Included  in  this  volume. 

[60]  "Communication  Theory  —  Exposition  of  Fundamentals,"  in  "Report 
of  Proceedings,  Symposium  on  Information  Theory,  London,  Sept., 

1950,  "  Institute  of  Radio  Engineers,  Transactions  on  Information 
Theory,  No.  1  (February,  1953),  pp.  44-47.  Included  in  Part  A. 

[61]  "General  Treatment  of  the  Problem  of  Coding,"  in  "Report  of 
Proceedings,  Symposium  on  Information  Theory,  London,  Sept.,  1950," 
Institute  of  Radio  Engineers,  Transactions  on  Information  Theory,  No.  1 
(February,  1953),  pp.  102-104.  Included  in  Part  A. 

[62]  "The  Lattice  Theory  of  Information,"  in  "Report  of  Proceedings, 
Symposium  on  Information  Theory,  London,  Sept.,  1950,"  Institute  of 
Radio  Engineers,  Transactions  on  Information  Theory,  No.  1  (February, 
1953),  pp.  105-107.  Included  in  Part  A. 

[63]  (With  E.  C.  Cherry,  S.  H.  Moss,  Dr.  Uttley,  I.  J.  Good,  W.  Lawrence  and 
W.  P.  Anderson)  "Discussion  of  Preceding  Three  Papers,"  in  "Report 
of  Proceedings,  Symposium  on  Information  Theory,  London,  Sept., 
1950,"  Institute  of  Radio  Engineers,  Transactions  on  Information 
Theory,  No.  1  (February,  1953),  pp.  169-174.  Included  in  Part  A. 

[64]  "Review  of  Description  of  a  Relay  Computer,  by  the  Staff  of  the 
[Harvard]  Computation  Laboratory,"  Proceedings  Institute  of  Radio 
Engineers,  Vol.  38  (1950),  p.  449.  Included  in  Part  B. 

[65]  "Recent  Developments  in  Communication  Theory,"  Electronics,  Vol. 
23  (April,  1950),  pp.  80-83.  Included  in  Part  A. 

[66]  German  translation  of  [65],  in  Tech.  Mitt.  P.T.T.,  Bern,  Vol.  28  (1950), 
pp.  337-342.  Not  included. 

[67]  "A  Method  of  Power  or  Signal  Transmission  To  a  Moving  Vehicle," 
Memorandum  for  Record,  July  19,  1950,  Bell  Laboratories,  2  pp.  +  4 
figs.  Included  in  Part  B. 

[68]  "Some  Topics  in  Information  Theory,"  in  Proceedings  International 
Congress  of  Mathematicians  (Cambridge,  Mass.,  Aug.  30  -  Sept.  6,  1950) 
,  American  Mathematical  Society,  Vol.  II  (1952),  pp.  262-263.  Included 
in  Part  A. 

[69]  "Prediction  and  Entropy  of  Printed  English,"  Bell  System  Technical 
Journal,  Vol.  30  (1951),  pp.  50-64.  (Received  Sept.  15,  1950.) 
Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the  Development  of 
Information  Theory,  IEEE  Press,  NY,  1974.  Included  in  Part  A. 

[70]  "Presentation  of  a  Maze  Solving  Machine,"  in  Cybernetics:  Circular, 
Causal  and  Feedback  Mechanisms  in  Biological  and  Social  Systems, 
Transactions  Eighth  Conference,  March  15-16,  1951,  New  York,  N.  K, 
edited  by  H.  von  Foerster,  M.  Mead  and  H.  L.  Teuber,  Josiah  Macy  Jr. 
Foundation,  New  York,  1952,  pp.  169-181.  Included  in  Part  B. 

[71]    "Control  Apparatus,"  Patent  application  Aug.  1951,  dropped  Jan.  21, 
1954.  Not  included. 

pp.  Included  in  this  volume. 

[73]  "A  Mind-Reading  (?)  Machine,"  Typescript,  March  18,  1953,  Bell 
Laboratories,  4  pp.  Included  in  Part  B. 

[74]  (With  E.  F.  Moore)  "The  Relay  Circuit  Analyzer,"  Memorandum  MM 
53-1400-9,  March  31,  1953,  Bell  Laboratories,  14  pp.  +  4  figs.  Included 
in  this  volume. 

[75]  "The  Potentialities  of  Computers,"  Typescript,  April  3,  1953,  Bell 
Laboratories.  Included  in  Part  B. 

[76]  "Throbac  I,"  Typescript,  April  9,  1953,  Bell  Laboratories,  5  pp. 
Included  in  Part  B. 

[72]    "Creative  Thinking," 

20,  1952,  Bell  Laboratories,  10 

[77]    "Throbac  -  Circuit  Operation,"  Typescript,  April  9,   1953,  Bell 
Laboratories,  7  pp.  Included  in  this  volume. 


[78]  "Tower  of  Hanoi,"  Typescript,  April  20,  1953,  Bell  Laboratories,  4  pp. 
Included  in  this  volume. 

[79]  (With  E.  F.  Moore)  "Electrical  Circuit  Analyzer,"  Patent  2,776,405. 
Filed  May  18,  1953,  granted  Jan.  1,  1957.  Not  included. 

[80]  (With  E.  F.  Moore)  "Machine  Aid  for  Switching  Circuit  Design," 
Proceedings  Institute  of  Radio  Engineers,  Vol.  41  (1953),  pp.  1348- 
1351.  (Received  May  28,  1953.)  Included  in  Part  B. 

[81]  "Mathmanship  or  How  to  Give  an  Explicit  Solution  Without  Actually 
Solving  the  Problem,"  Typescript,  June  3,  1953,  Bell  Laboratories,  2  pp. 
Included  in  this  volume. 

[82]  "Computers  and  Automata,"  Proceedings  Institute  of  Radio  Engineers, 
Vol.41  (1953),  pp.  1234-1241.  (Received  July  17,  1953.)  Reprinted  in 
Methodos,  Vol.  6  (1954),  pp.  1 15-130.  Included  in  Part  B. 

[83]  "Realization  of  All  16  Switching  Functions  of  Two  Variables  Requires 
18  Contacts,"  Memorandum  MM  53-1400-40,  November  17,  1953,  Bell 
Laboratories,  4  pp.  +  2  figs.  Included  in  Part  B. 

[84]  (With  E.  F.  Moore)  "The  Relay  Circuit  Synthesizer,"  Memorandum 
MM  53-140-52,  November  30,  1953,  Bell  Laboratories,  26  pp.  +  5  figs. 
Included  in  this  volume. 

[85]  (With  D.  W.  Hagelbarger)  "A  Relay  Laboratory  Outfit  for  Colleges," 
Memorandum  MM  54-114-17,  January  10,  1954,  Bell  Laboratories. 
Included  in  Part  B. 

[86]  "Efficient  Coding  of  a  Binary  Source  With  One  Very  Infrequent 
Symbol,"  Memorandum  MM  54-114-7,  January  29,  1954,  Bell 
Laboratories.  Included  in  Part  A. 

[87]  "Bounds  on  the  Derivatives  and  Rise  Time  of  a  Band  and  Amplitude 
Limited  Signal,"  Typescript,  April  8,  1954,  Bell  Laboratories,  6  pp.  +  1 
Fig.  Included  in  this  volume. 

[88]  (With  Edward  F.  Moore)  "Reliable  Circuits  Using  Crummy  Relays," 
Memorandum  54-114-42,  Nov.  29,  1954,  Bell  Laboratories.  Published 
as  the  following  two  items. 

[89]  (With  Edward  F.  Moore)  "Reliable  Circuits  Using  Less  Reliable  Relays 
I,"  Journal  Franklin  Institute,  Vol.  262  (Sept.,  1956),  pp.  191-208. 
Included  in  Part  B. 

[90]  (With  Edward  F.  Moore)  "Reliable  Circuits  Using  Less  Reliable  Relays 
n,"  Journal  Franklin  Institute,  Vol.  262  (Oct.,  1956),  pp.  281-297. 
Included  in  Part  B. 

[91]  (Edited  jointly  with  John  McCarthy)  Automata  Studies,  Annals  of 
Mathematics  Studies  Number  34,  Princeton  University  Press,  Princeton, 


NJ,  1956,  ix  +  285  pp.  The  Preface,  Table  of  Contents,  and  the  two 
papers  by  Shannon  are  included  in  Part  B. 

[92]  (With  John  McCarthy),  Studien  zur  Theorie  der  Automaten,  Munich, 
1974.  (German  translation  of  the  preceding  work.) 

[93]  ' 'A  Universal  Turing  Machine  With  Two  Internal  States,"  Memorandum 
54-114-38,  May  15,  1954,  Bell  Laboratories.  Published  in  Automata 
Studies,  pp.  157-165.  Included  in  Part  B. 

[94]  (With  Karel  de  Leeuw,  Edward  F.  Moore  and  N.  Shapiro) 
"Computability  by  Probabilistic  Machines,"  Memorandum  54-114-37, 
Oct.  21,  1954,  Bell  Laboratories.  Published  in  [87],  pp.  183-212. 
Included  in  Part  B. 

[95]  "Concavity  of  Transmission  Rate  as  a  Function  of  Input  Probabilities," 
Memorandum  MM  55-1 14-28,  June  8,  1955,  Bell  Laboratories.  Included 
in  this  volume. 

[96]  "Some  Results  on  Ideal  Rectifier  Circuits,"  Memorandum  MM  55-1 14- 
29,  June  8,  1955,  Bell  Laboratories.  Included  in  Part  B. 

[97]  "The  Simultaneous  Synthesis  of  s  Switching  Functions  of  n  Variables," 
Memorandum  MM  55-1 14-30,  June  8,  1955,  Bell  Laboratories.  Included 
in  Part  B. 

[98]  (With  D.  W.  Hagelbarger)  "Concavity  of  Resistance  Functions," 
Journal  Applied  Physics,  Vol.  27  (1956),  pp.  42-43.  (Received  August  1, 
1955.)  Included  in  Part  B. 

[99]  '  'Game  Playing  Machines,"  Journal  Franklin  Institute,  Vol.  260  ( 1 955), 
pp.  447-453.  (Delivered  Oct.  19,  1955.)  Included  in  Part  B. 

[100]  "Information  Theory,"  Encyclopedia  Britannica,  Chicago,  IL,  14th 
Edition,  1968  printing,  Vol.  12,  pp.  246B-249.  (Written  circa  1955.) 
Included  in  Part  A. 

[101]  "Cybernetics,"  Encyclopedia  Britannica,  Chicago,  IL,  14th  Edition, 
1968  printing,  Vol.  12.  (Written  circa  1955.)  Not  included. 

[102]  "The  Rate  of  Approach  to  Ideal  Coding  (Abstract),"  Proceedings 
Institute  of  Radio  Engineers,  Vol.  43  (1955),  p.  356.  Included  in  Part  A. 

[103]  "The  Bandwagon  (Editorial),"  Institute  of  Radio  Engineers, 
Transactions  on  Information  Theory,  Vol.  IT-2  (March,  1956),  p.  3. 
Included  in  Part  A. 

[104]  "Information  Theory,"  Seminar  Notes,  Massachusetts  Institute  of 
Technology,  1956  and  succeeding  years.  Included  in  this  volume. 
Contains  the  following  sections: 

"A  skeleton  key  to  the  information  theory  notes,"  3  pp.  "Bounds  on  the 


tails  of  martingales  and  related  questions,"  19  pp.    "Some  useful 
inequalities  for  distribution  functions,"  3  pp.  "A  lower  bound  on  the 
tail  of  a  distribution,"  9  pp.  "A  combinatorial  theorem,"  1  p.  "Some 
results  on  determinants,"  3  pp.  "Upper  and  lower  bounds  for  powers  of 
a  matrix  with  non-negative  elements,"  3  pp.  "The  number  of  sequences 
of  a  given  length,"  3  pp.    "Characteristic  for  a  language  with 
independent  letters,"  4  pp.  "The  probability  of  error  in  optimal  codes," 
5  pp.   "Zero  error  codes  and  the  zero  error  capacity  C0,"  10  pp. 
"Lower  bound  for  Pef  for  a  completely  connected  channel  with 
feedback,"  1  p.  "A  lower  bound  for  Pe  when  R  >  C,"  2  pp.  "A  lower 
bound  for  Pe,"  2  pp.  "Lower  bound  with  one  type  of  input  and  many 
types  of  output,"  3  pp.   "Application  of  'sphere-packing'  bounds  to 
feedback  case,"  8  pp.  "A  result  for  the  memoryless  feedback  channel," 
1  p.  "Continuity  of  P  e  opt  as  a  function  of  transition  probabilities,"  1  p. 
"Codes  of  a  fixed  composition,"  1  p.  "Relation  of  Pe  to  p,"  2  pp. 
"Bound  on  Pe  for  random  ode  by  simple  threshold  argument,"  4  pp. 
"A  bound  on  Pe  for  a  random  code,"  3  pp.  "The  Feinstein  bound,"  2 
pp.  "Relations  between  probability  and  minimum  word  separation,"  4 
pp.  "Inequalities  for  decodable  codes,"  3  pp.  "Convexity  of  channel 
capacity  as  a  function  of  transition  probabilities,"  1  pp.  "A  geometric 
interpretation  of  channel  capacity,"  6  pp.   "Log  moment  generating 
function  for  the  square  of  a  Gaussian  variate,"  2  pp.  "Upper  bound  on 
Pe  for  Gaussian  channel  by  expurgated  random  code,"  2  pp.  "Lower 
bound  on  Pe  in  Gaussian  channel  by  minimum  distance  argument,"  2 
pp.    "The  sphere  packing  bound  for  the  Gaussian  power  limited 
channel,"  4  pp.   "The  ^-terminal  channel,"  7  pp.   "Conditions  for 
constant  mutual  information,"  2  pp.  "The  central  limit  theorem  with 
large  deviations,"  6  pp.  "The  Chernoff  inequality,"  2  pp.  "Upper  and 
lower  bounds  on  the  tails  of  distributions,"  4  pp.  "Asymptotic  behavior 
of  the  distribution  function,"  5  pp.   "Generalized  Chebycheff  and 
Chernoff  inequalities,"  1  p.  "Channels  with  side  information  at  the 
transmitter,"  13  pp.  "Some  miscellaneous  results  in  coding  theory,"  15 
pp.  "Error  probability  bounds  for  noisy  channels,"  20  pp. 

[105]  "Reliable  Machines  from  Unreliable  Components,"  notes  of  five 
lectures,  Massachusetts  Institute  of  Technology,  Spring  1956,  24  pp.  Not 

[106]  "The  Portfolio  Problem,  and  How  to  Pay  the  Forecaster,"  lecture  notes 
taken  by  W.  W.  Peterson,  Massachusetts  Institute  of  Technology,  Spring, 
1956,  8  pp.  Included  in  this  volume. 

[107]  "Notes  on  Relation  of  Error  Probability  to  Delay  in  a  Noisy  Channel," 
notes  of  a  lecture,  Massachusetts  Institute  of  Technology,  Aug.  30,  1956, 
3  pp.  Included  in  this  volume. 

"Notes  on  the  Kelly  Betting  Theory  of  Noisy  Information,"  notes  of  a 
lecture,  Massachusetts  Institute  of  Technology,  Aug.  31,  1956,  2  pp. 

-  10- 

Included  in  this  volume. 

[109]  "The  Zero  Error  Capacity  of  a  Noisy  Channel,"  Institute  of  Radio 
Engineers,  Transactions  on  Information  Theory,  Vol.  IT-2  (September, 
1956),  pp.  S8-S19.  Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the 
Development  of  Information  Theory,  IEEE  Press,  NY,  1974.  Included  in 
Part  A. 

[110]  (With  Peter  Elias  and  Amiel  Feinstein)  "A  Note  on  the  Maximum  Flow 
Through  a  Network,"  Institute  of  Radio  Engineers,  Transactions  on 
Information  Theory,  Vol.  IT-2  (December,  1956),  pp.  117-119. 
(Received  July  11,  1956.)  Included  in  Part  B. 

[Ill]  "Certain  Results  in  Coding  Theory  for  Noisy  Channels,"  Information 
and  Control,  Vol.  1  (1957),  pp.  6-25.  (Received  April  22,  1957.) 
Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the  Development  of 
Information  Theory,  IEEE  Press,  NY,  1974.  Included  in  Part  A. 

[112]  "Geometrische  Deutung  einiger  Ergebnisse  bei  die  Berechnung  der 
Kanal  Capazitat"  [Geometrical  meaning  of  some  results  in  the 
calculation  of  channel  capacity],  Nachrichtentechnische  Zeit.  (N.T.Z.), 
Vol.  10  (No.  1,  January  1957),  pp.  1-4.  Not  included,  since  the  English 
version  is  included. 

[113]  "Some  Geometrical  Results  in  Channel  Capacity,"  Verband  Deutsche 
Elektrotechniker  Fachber.,  Vol.  19  (II)  (1956),  pp.  13-15  = 
Nachrichtentechnische  Fachber.  (N.T.F.),  Vol.  6  (1957).  English  version 
of  the  preceding  work.  Included  in  Part  A. 

[1 14]  "Von  Neumann's  Contribution  to  Automata  Theory,"  Bulletin  American 
Mathematical  Society,  Vol.  64  (No.  3,  Part  2,  1958),  pp.  123-129. 
(Received  Feb.  10,  1958.)  Included  in  Part  B. 

[115]  "A  Note  on  a  Partial  Ordering  for  Communication  Channels," 
Information  and  Control,  Vol.  1  (1958),  pp.  390-397.  (Received  March 
24,  1958.)  Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the 
Development  of  Information  Theory,  IEEE  Press,  NY,  1974.  Included  in 
Part  A. 

[116]  "Channels  With  Side  Information  at  the  Transmitter,"  IBM  Journal 
Research  and  Development,  Vol.  2  (1958),  pp.  289-293.  (Received  Sept. 
15,  1958.)  Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the 
Development  of  Information  Theory,  IEEE  Press,  NY,  1974.  Included  in 
Part  A. 

[117]  "Probability  of  Error  for  Optimal  Codes  in  a  Gaussian  Channel,"  Bell 
System  Technical  Journal,  Vol.  38  (1959),  pp.  611-656.  (Received  Oct. 
17,  1958.)  Included  in  Part  A. 

[118]  "Coding  Theorems  for  a  Discrete  Source  With  a  Fidelity  Criterion," 
Institute  of  Radio  Engineers,  International  Convention  Record,  Vol.  7 

-11  - 

(Part  4,  1959),  pp.  142-163.  Reprinted  with  changes  in  Information  and 
Decision  Processes,  edited  by  R.  E.  Machol,  McGraw-Hill,  NY,  1960, 
pp.  93-126.  Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the 
Development  of  Information  Theory,  IEEE  Press,  NY,  1974.  Included  in 
Part  A. 

[119]  "Two-Way  Communication  Channels,"  in  Proceedings  Fourth  Berkeley 
Symposium  Probability  and  Statistics,  June  20  -  July  30,  1960  ,  edited  by 
J.  Neyman,  Univ.  Calif.  Press,  Berkeley,  CA,  Vol.  1,  1961,  pp.  611-644. 
Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the  Development  of 
Information  Theory,  IEEE  Press,  NY,  1974.  Included  in  Part  A. 

[120]  "Computers  and  Automation  —  Progress  and  Promise  in  the  Twentieth 
Century,"  Man,  Science,  Learning  and  Education.  The  Semicentennial 
Lectures  at  Rice  University  ,  edited  by  S.  W.  Higginbotham,  Supplement 
2  to  Vol.  XLIX,  Rice  University  Studies,  Rice  Univ.,  1963,  pp.  201-211. 
Included  in  Part  B. 

[121]  Papers  in  Information  Theory  and  Cybernetics  (in  Russian),  Izd.  Inostr. 
Lit.,  Moscow,  1963,  824  pp.  Edited  by  R.  L.  Dobrushin  and  O.  B. 
Lupanova,  preface  by  A.  N.  Kolmogorov.  Contains  Russian  translations 
of  [1],  [6],  [14],  [25],  [37],  [40],  [43],  [44],  [50],  [51],  [54]-[56],  [65], 
[68]-[70],  [80],  [82],  [89],  [90],  [93],  [94],  [99],  [103],  [109]-[111], 

[122]  (With  R.  G.  Gallager  and  E.  R.  Berlekamp)  "Lower  Bounds  to  Error 
Probability  for  Coding  on  Discrete  Memoryless  Channels  I," 
Information  and  Control,  Vol.  10  (1967),  pp.  65-103.  (Received  Jan.  18, 
1966.)  Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the  Development 
of  Information  Theory,  IEEE  Press,  NY,  1974.  Included  in  Part  A. 

[123]  (With  R.  G.  Gallager  and  E.  R.  Berlekamp)  "Lower  Bounds  to  Error 
Probability  for  Coding  on  Discrete  Memoryless  Channels  U," 
Information  and  Control,  Vol.  10  (1967),  pp.  522-552.  (Received  Jan. 
18,  1966.)  Reprinted  in  D.  Slepian,  editor,  Key  Papers  in  the 
Development  of  Information  Theory,  IEEE  Press,  NY,  1974.  Included  in 
Part  A. 

[124]  "The  Fourth-Dimensional  Twist,  or  a  Modest  Proposal  in  Aid  of  the 
American  Driver  in  England,"  typescript,  All  Souls  College,  Oxford, 
Trinity  term,  1978,  7  pp.  +  8  figs.  Included  in  this  volume. 

[125]  "Claude  Shannon's  No-Drop  Juggling  Diorama,"  Juggler's  World,  Vol. 
34  (March,  1982),  pp.  20-22.  Included  in  Part  B. 

[126]  "Scientific  Aspects  of  Juggling,"  Typescript,  circa  1980.  Included  in 

[127]  "A  Rubric  on  Rubik  Cubics,"  Typescript,  circa  1982,  6  pp.  Included  in 
this  volume. 

K-t7«IA  (-*»*) 

is  J 

Cover  Sheet  for  Technical  Memoranda 
Research  Department 

subject:  The  Use  of  the  Lakato s-Hi okman  Relay  in  a 
Subscriber  Sender  -  Case  20878 


i  -  Patent .Deit.  (letter  9/27/40) 


1 — e— W.W.Ke^all,  Case  Pile 

3  -  T.C.Fry 

4  -  A* B. Clark 

s  -  B.D.Holbrook 

6  -  G.R.Stibitz 

7  -  G.V.King 

8  -Miss  Hanle 

mm-  40-130-179 
date  August  13,  1940 
author  c.E.Shannon 

INDEX  NO.  S4.2 


A  study  is  made  of  the  possibilities  of  using 
the  Lakato s- Hickman  type  relay  for  the  counting,  regis- 
tering, steering,  and  pulse  apportioning  operations  in 
a  subscriber  sender.      Cirouits  are  shown  for  the  more 
important  parts  of  the  circuit  where  it  appears  that  the 
new  type  relay  would  effeot  an  eoonomy. 


Tilt  Use  of  the  Lakatos-Hiokman  Relay  in  a  Sub  bo  r  iter  Sander  • 
Cast  E0878 


August  15,  1940 


The  Lakatos-Siokmen  type  relay1* using  the  relay  springs 
as  part  of  the  magnetic  eiroult  can  he  used  as  a  very  eeonomioal 
type  of  pulse  counter  and  registration  device.    In  faot ,  one  suoh 
relay  with  twenty  moving  springs  can  count  and  register  up  to  ten 
pulses,  while  the  same  operation  requires  at  least  five  ordinary 
relays,  and  some  standard  oirouits  use  as  many  as  twenty  to  re- 
duce the  spring  loading  on  the  relays  and  the  contact  loading  in 
the  pulsing  circuit.    It  has  been  suggested  that  this  new  type 
of  relay  might  he  used  for  some  or  all  of  the  many  counting, 
steering,  and  registration  oirouits  in  a  subscriber  type  sender* 
The  present  memorandum  gives  some  oirouits  for  accomplishing 
this*    The  chief  problem  in  the  design  of  these  oirouits  Is 
that  of  performing  the  various  translating  operations  necessary 
in  converting  the  incoming  pulses  into  group  and  brush  selections, 
or  P.C.I,  pulses  as  the  oase  may  be,  without  using  more  oontaot 
elements  than  are  available  on  the  counting  relay.    Two  different 
solutions  are  given  here.    The  first  was  made  as  economical  as 
possible  but  at  the  oost  of  one  disadvantage.    Under  certain 
conditions  of  oontaet  failure  in  the  thousands  or  hundreds  regis- 
ter the  sender  will  oonneot  the  subscriber  to  an  incorrect  number 
rather  than  connect ing  to  a  tell-tale  and  giving  him  a  busy  sig- 
nal.   The  seoond  oiroult,  which  we  will  call  the  positive  aotion 
oiroult^,  is  designed  to  overcome  this  difficulty  but  does  so  at 
the  expense  of  more  contaots  and  wiring.    Some  compromise  between 
these  circuits  may  be  the  most  desirable.    The  oirouits  by  no 
means  represent  a  complete  sender.    It  appears  that  the  problems 
connected  with  the  offioe  code  (i.e.  the  first  two  or  three 
digits)  can  be  handled  without  muoh  difficulty.    At  any  rate 
these  oirouits  will  depend  on  the  type  of  decoder  used,  and 
would  represent  a  second  stage  in  the  design*    We  have  therefore 
designed  what  might  be  called  a  "four  digit  sender**  considering 
only  the  problems  arising  in  the  thousands,  hundreds,  tens  and 
units  digits.    We  also  have  omitted  consideration  of  the  parts 
of  the  oiroult  used  for  control  and  supervisory  purposes,  since 
these  can  be  easily  handled  by  existing  oirouits,  and  do  not 
directly  involve  the  new  type  relay.    Our  chief  purpose  is  to 

Isee  "Oiroult  Analysis  for  Laxatos-Eiokman  Type  Relay", 
0.  R.  Stibits,  MM40-150-1BO,  Jan.  15,  1940,  Oase  £0878. 

^This  circuit  was  suggested  by  Hr.  0.  T.  King 

■how  that  the  new  type  counter  oontalna  sufficient  contact 
element!  for  aost  of  the  steering  and  counting  circuit*  of  the 
subscriber  sender.    It  is  always  possible  to  add  more  contacts 
at  an/  stage  in  the  new  type  counter  by  the  arrangement  of 
springs  in  Jig.  1,  but  this  would  be  undesirable  from  the 
standpoint  of  standardization*    At  any  rate  it  was  found  that 
even  in  the  positive  action  circuit,  only  two  stages  in  one 
register  needed  more  contacts  than  are  already  available,  and 
two  additional  ordinary  relays  were  introduced  here  to  carry  the 
contact  load* 

It  should  be  pointed  out  that  an  extremely  simple  and 
economical  sender  (i.e.,  much  simpler  than  those  given  here) 
could  be  designed  using  the  new  type  counter  were  it  not  for 
the  peculiar  translation  codes  involved.    Thus  if  we  could  start 
*Yrom  scratch"  and  design  translation  codes  particularly  adapted 
to  the  characteristics  of  the  new  relay,  the  circuits  could  be 
made  very  simple  indeed.    Even  using  the  existing  oodes  which 
were  constructed  to  simplify  the  present  type  olrouits,  the  use 
of  the  new  counter  allows  a  remarkable  simplicity  and  economy* 

The  circuits  were  designed  by  a  combination  of  common 
sense  and  Boolean  algebra  methods.    We  will  omit  the  details 
involved  in  their  design.    Although  it  is  possible  that  a  few 
superfluous  elements  remain,  it  is  doubtful  if  they  can  be 
simplified  very  much* 

Figure  E  is  a  block  diagram  of  the  proposed  sender* 
In  the  present  panel  and  crossbar  senders,  pulse  counting  is 
done  in  the  same  circuit  for  each  digit  and  the  numbers  trans- 
ferred from  this  counting  circuit  to  a  set  of  registering  cir- 
cuits, one  for  eaoh  digit,  through  an  incoming  steering  chain. 
The  registering  circuits  in  the  panel  type  sender  consist  of  a 
set  of  five  ordinary  relays  per  digit,  while  in  the  crossbar 
system  the  A  digit  is  registered  on  one  or  two  verticals  of  a 
crossbar  switch*    In  Figure  S,  on  the  other  hand,  eaoh  digit 
has  one  of  the  new  type  counter  relays  which  acts  both  as  a 
pulse  counter  and  as  a  register.    The  incoming  steering  chain 
steers  the  incoming  pulses  to  the  correct  counter-register 
rather  than  steering  the  number  recorded  by  the  input  pulse 
counter  to  a  digit  register*    The  input  steering  chain  may  or 
may  not  be  one  of  the  new  type  counters*    The  steering  opera- 
tion can  be  done  with  the  new  type  counter,  but  it  appears  to 
require  special  devices,  as  for  example  polarised  springs,  in 
order  to  energize  both  windings  of  the  register  relays  after 
receiving  a  digit*    Even  using  the  present  type  of  steering 
chain  a  great  simplification  is  possible,  for  only  one  wire, 
the  pulsing  lead,  needs  to  be  steered  to  the  various  digit 
registers,  rather  than  the  five  leads  of  the  present  type 
sender*   Another  possibility  is  using  a  new  type  counter  to 
count  the  groups  of  pulses  and  operate  a  set  of  relays  8^,  Sj, 

Sq,  Sthi  Sst  Sf »  sU  come  1a  after  the  A,  B,  0,  IB,  I,  T, 

and  U  digits  are  received  end  energize  both  eoile  of  the  corre- 
sponding registers* 

After  the  digits  are  registered  on  the  new  type 
counters,  these  numbers  are  translated  bj  means  of  the  oontaet 
interconnections  into  the  code  corresponding  to  the  incoming 
brush,  incoming  group,  final  brush,  tens,  and  units  selections, 
which  are  represented  by  a  ground  on  one  of  the  leads  in  the 
groups  marked  IB,  10,  YB,  T,  and  V,  respectively.    These  groups 
of  leads  are  connected  in  sequence  to  the  revertive  pulse  counter 
by  means  of  the  revert  ire  group  counter*    The  revertive  pulse 
counter  will  be  one  of  the  new  type  relays  and  is  connected  in 
suoh  a  way  as  to  open  the  fundamental  circuit  and  thus  stop  the 
revertive  pulsing  when  it  reaches  the  first  ground.    The  revertive 
group  counter  or  revertive  steering  chain,  of  course,  steps  ahead 
after  each  group  of  revertive  pulses  through  the  action  of  a  slow 
release  relay.    This  last  steering  operation  cannot  be  done  solely 
with  one  of  the  new  type  relays  for  it  is  necessary  to  steer  ten 
leads  in  the  tens  and  units  digits.    It  could  be  done,  however, 
with  a  new  type  counter  in  conjunction  with  four  ordinary  relays. 

In  the  case  of  a  call  to  a  manual  office  the  outputs 
of  the  digit  registers  are  translated  by  a  P.O.I,  circuit  into 
the  correct  P.O.I,  codes.    This  circuit,  too,  can  make  use  of  the 
new  type  counter  in  the  quadrant ing  operation,  i.e.  in  apportion- 
ing four  quadrants  to  each  of  the  four  digits  to  be  transmitted. 
This  would  be  done  with  a  sixteen  stage  counter  (or  if  it  is  de- 
sirable to  have  all  oounters  with  ten  stages,  two  of  these  could 
be  connected  "in  series")  replacing  the  present  sequence  switch* 

Of  course  there  must  be  an  interlock  between  the  incom- 
ing and  revertive  steering  chains  to  prevent  any  selection  being 
made  before  sufficient  information  has  been  received.    This  can 
be  done  by  fairly  standard  methods* 

A  rough  comparison  can  be  made  between  the  relay  re- 
quirements of  the  present  panel  type  sender  end  the  design  pro* 
posed  here.    Omitting  parts  of  the  circuit  which  would  be  sub- 
stantially the  same  the  requirements  are  listed  below: 


Panel  Sender  Proposed  Sender 

Ordinary  Hew  Type  Ordinary 

Operation  Relays  Counters  Belays 

Input  Counting  1*  - 

Input  Steering  It  i  • 

Registration  »•  f 

Revertive  Counting  .   *Q  t  « 

Revertive  Steering  10   L-  JL 

Total  U  T 

In  addition,  a  eequenoe  ewitoh  la  replaoed  by  a  new  type  counter. 
Tliasa  figures  are  based  on  the  positive  action  oirouit.  Jhe 
other  oirouit  uses  6  ordinary  relays.    This  eoaparison  of  the 
numbers  of  relays  involved  shows  only  a  small  part  of  the  saving, 
however.    The  wiring  and  fundamental  method  of  operation  of  the 
new  oirouit  is  muoh  simpler  which  tends  both  toward  eoonomy  and, 
providing  the  new  relay  ©an  be  made  suffielently  reliable,  elim- 
ination of  faults  and  errors* 

It  is  a  little  more  difficult  to  give  a  quantitative 
comparison  of  tha  proposed  sender  with  the  present  crossbar  type 
sender  due  to  the  differences  in  the  types  of  oirouit  elements  In- 
volved, but  it  appears  that  the  saving  would  be  of  the  same  order 
of  magnitude* 

The  new  type  counter  with  ten  stages  aota  like  a  series 
of  twenty  relays  which  come  in  sequentially  as  the  two  coils  of 
the  relay  are  alternately  energized.    Thus  after  n  pulses  the 
first  Sn  relays  are  operated.    If,  after  a  series  of  pulses  only 
one  of  the  two  coils  on  a  counter  remains  energized  we  can  only 
be  sure  of  the  oontacts  on  that  side.    It  was  found  that  under 
these  conditions  the  number  of  eontaots  available  was  far  too 
small  in  all  of  the  four  registers  for  the  various  translating 
operations  neoessary.    We  have  therefore  assumed  the  steering 
circuit  should  be  designed  in  such  a  way  as  to  energize  both 
coils  of  a  counter  after  it  has  received  its  series  of  pulses** 
This  insures  the  oontacts  on  both  sides  and  each  stage  then  has 
the  equivalent  of  two  transfer  eontaots  and  two  additional  eon- 
taots somewhat  similar  to  a  switohhook  connection.    Thus  eaoh 
stage  may  be  considered  as  a  relay  with  the  eontaots  available 
indicated  In  figure  5.    Our  circuit  diagrams  are  drawn  from 
this  point  of  view* 

Tor  the  convenience  of  the  reader  we  will  list  the 
various  translation  oodes  used  in  the  sender*    The  incoming 
brush  seleotlon  depends  only  on  the  thousands  digit  and  Is 
given  by  the  following  tablet 

Incoming  Brush 



0,  1 
*,  * 
4.  5 

•See  the  memorandum  "Oirouit  Arrangement  for  Counting  Relay  with 
Mechanically  Independent  Contact  Springs",  by  B*  D.  Bolbrook, 
HM-40-130-149,  July  5,  1940,  Oase  ££108-1. 

The  incoming  group  ssleotion  depends  on  both  the 
hundreds  and  thousands  digits  and  is  given  bj  tha  following; 





<  6 

<  5 

Inooeiing  Group 



Tha  final  brush  salaotion  dapands  only  on  tha  hundreds 
We  hare  tha  following  oodat 


0,  6 

1.  • 

*,  1 

3,  8 

4,  • 

Final  Brush 




P.O.I.  Oode  for  Thousands  Digit 

It  should  be  remembered  that  an  inooming  brush,  incom- 
ing group,  or  final  brush  saleotion  of  &  corresponds  to  n  ♦  1 
rerertire  pulses.    Tha  same  remark:  applies  to  tha  tans  and  hun- 
dreds selection. 

Digits  are  sent  to  a  call  indicator  bjr  series  of  posi- 
tive and  negative  pulses,  four  for  aaoh  digit*    Two  different 
codes  are  used  for  this,  one  for  the  thousands  digit  and  tha 
other  for  thehuadreda,  tans,  and  units.    The  thousands  oode  is 
an  additive  one  baaed  on  the  numbers  1,  2,  4,  and  8  as  follows: 













Corresponding  Additive 













-  6  • 

The  sum  of  the  numbers  ocr  responding  to  tht  columns  in  whioh  a 
digit  has  tha  symbol  -  gives  that  digit,  henot  tha  additive 
property  of  tha  code.    In  this  tabla  I,  II.  IH,  and  IT  refer 
to  tha  four  pulses  or  quadrants.    In  the  first  and  third  quadrants 
0  represents  a  ground  and  a  -  represents  a  posit ire  pulse.    In  the 
even  quadrants  0  means  a  light  negative  pulse  and  the  -,  a  hear? 
negative  pulse.    We  have  chosen  this  representation  of  the  oode 
for  comparison  with  the  P.O.I,  circuit  in  which  four  leads  are 
grounded  or  not  in  aooordanoe  with  the  above  table*    Thus  if  the 
digit  8  is  registered  in  the  thousands  place,  lends  II  and  HI  in 
a  group  I,  II,  III,  IT  are  grounded.    The  presence  or  absence  of 
these  grounds  are  translated  into  positive  or  negative  pulses  by 
two  relays  TS  and  RS. 

The  hundreds,  tens,  and  units  P.O.I,  code  is  also  addi- 
tive based  on  the  numbers  1,  S,  4,  6.  Using  the  same  conventions 
it  is  represented  by  the  following  table: 

P.O.I.  Oode  for  Hundreds,  Tens,  and  Units  Digits 

H,  T,  or  Quadrant 

u  Digit       i       n       in  it 

i  .000 

t  o-oo 

8  ..00 

4  0  0  -  0 

5  0  0  0  - 

6  -00 

T  0  —  0  — 

8  -  -  0 

9  0  0- 

0  0  0  0  0 


Numbers       (1)         (8)         (4)  (5) 

The  circuit  for  the  tens  or  units  register  is  shown  In  Figure  4. 
The  operation  is  quite  obvious.    In  the  ease  of  a  full  mechanical 
call,  if  6  for  example  were  dialed  in  the  tans  plaee,  the  first 
six  relays  are  looked  in,  which  places  a  ground  on  the  lead  marked 
6.    These  are  connected  through  the  revert ive  steering  chain  to 
the  revertive  counter  which  reaches  this  ground  after  the  seventh 
revert  ive  pulse.    The  presence  of  this  ground  operates  a  relay 
whioh  opens  the  fundamental  circuit  and  stops  the  pulsing. 
A  ground  is  also  put  on  leads  II  and  HI  for  a  P.O.I,  call. 
The  operation  of  the  P.O.I,  circuit  will  be  described  later. 
The  thousands  and  hundreds  register  is  shown  in  figure  5  for  the 
positive  action  circuit  and  in  Figure  6  for  the  more  economical 
circuit.    In  Figure  8,  many  of  the  contaots  do  double  duty, 
translating  both  for  P.O.I,  and  full  mechanical  calls.    This  is 
done  through  a  relay  P  which  is  operated  for  a  manual  call  and 
not  for  amechanical  call.   In  the  hundreds  register  there  were 
not  enough  contacts  available  in  the  fifth  and  tenth  stages. 

The  relays  R  and  8  ere  used  to  •arrjr  part  of  the  eontaot  load* 
This  oireuit  la  designed  ae  that  ohe  and  only  one  of  the  IB,  10, 
and  TB  laada  la  grounded  for  a  given  number.    In  ease  of  a  oon- 
taot failure  none  would  he  grounded  and  the  corresponding  commu- 
tator would  supposedly  go  to  a  telltale.    In  the  oirouit  of  figure 
6,  on  the. other  hand,  more  than  one  of  the  IB,  10,  or  TB  leads  may 
he  grounded  at  the  same  time.    Thus  if  the  thousands  digit  is  8, 
both  8  and  4  in  the  IB  group  are  grounded.    If  the  back  eontaet 
on  8  failed  the  rerertive  pulse  counter  would  not  stop  the  pulsing 
aotion  at  brush  8  as  it  should  but  would  go  on  to  the  fourth  brush. 
Howersr,  this  olreuit  is  considerably  simpler  than  Figure  8,  and 
does  not  appear  worse  from  the  standpoint  of  possible  wrong  num- 
bers than  the  present  type  of  sender* 

The  P.C.I,  eirouit  is  shown  in  Figure  7.    I  is  a  relay 
whioh  is  operated  in  the  odd  quadrants  and  not  in  the  even  quad- 
rants.   TS  and  RS  are  relays  whose  windings  are  oonneoted  sequen- 
tially through  the  P.O.I,  impulse  ehain  to  first  the  thousands 
P.O.I,  leads  I,  II,  IH,  and  IT,  then  the  hundreds,  etc.  aoeord- 
ing  to  the  following  tablet 













Th  I 

Th  II 


Th  III 

Th  II 



Th  III 

Th  IT 


E  I 

Th  IT 



E  I 

E  II 




e  n 

;  i 



E  IT 

i  8 


T  I 

E  IT 

;  • 


T  I 

t  n 



T  in 

t  n 



T  HI 

T  IT 


U  I 

T  IT 



V  I 

u  n 

u  in 

u  n 



v  m 

U  IT 


U  IT 

In  the  odd  quadrants  Z  is  operated,  placing  a  ground  on  the 
fundamental  ring  (»)•    The  fundamental  tip  (FT)  ia  connected 
through  Z  to  either  ground  or  positive  battery  according  as 
TS  is  operated  or  not.    This  depends  of  course  on  the  condl- 

-  8  - 

t ion  of  the  P.C.I,  lead  to  whioh  TS  is  connected  at  the  time* 
Similarly  in  the  eran  quadrants  light  or  beary  roltage  is 
applied  to  FR  according  to  the  eondition  of  RS  while  FT  is 

Figure  8  shows  the  rerertire  steering  chain  and  re- 
rertire  pulse  counter. 

0.  S*  SHANNON 

FIG.  3 

—  I 
— u 
V~  m 


I  7 


L  9  J 

FIG.  4- 

TS/VS   OR   UMTS  #£6/ST£K 



■  Vj 






Mil  TtimM!  UMIITMIIS.  IK..  Ill 



















M     M     S     H  0-C\J<T>«- 




Ah*,  ^^h. 




3  C" 

<Hi-  *<Hl< 



o    <\j  «i 


O  -  WO  1 

I  1  ■ 








PHI  IN  U.t.A. 

l  ill-A  l«-3») 

F/0.  7 
P.C.I.  C/RCU/T 





■  Vj 





lilt  TELIPMIE  liMIITMIH.  IK..  lit 




IB  < 


I  - 



T  I 

7  • 

0  < 

-o  o- 

-o  t>- 

3  3 

o  o 


-o  o- 

o  o- 

■o  o- 

c  o 

o  o 

o  o- 


o  o 
o  o 

K3  o 

o  O 


-O  O- 

-o  o- 


■  7 
















■  Vj 






Kit  UMUTHICI.  IK.,  It*  Tti 





by  TKfS  is  a  Final 


Claude  E.  Shannon  ^.w/L-lL  -  if)  4 


1.  The  deflection  mechanism  may  be  divided  into  three  partB. 
The  first  is  driven  by  two  shafts  and  has  one  shaft  as  out- 
put, which  feeds  the  second  part.    This  unit  has  a  single 
shaft  output  which  serves  as  input  to  the  third  part,  whose 
output  is  also  a  single  shaft,  used  as  the  desired  azimuth  cor- 

2.  The  first  unit  is  a  simple  integrator.     It*,  output  rate  is 

3.    The  second  part  is  the  same  circuit  as  previous  rate  finders. 
Its  presence  appears  to  be  detrimental  to  the  operation  of 
the  system  from  several  standpoints.    The  output  e  of  this  part 
satisfies i 

•  ■  x-f-  y 


4.     The  third  and  most  important  part  of  the  macnine  satisfies 

q  +  R  4  +       L  q  -  • 

in  whicht 

•  ■  an  input  forcing  function  which  except  for  transients  in 
the  seoond  part  and  other  small  effeots  ia  the  function 
whose  rate  is  to  bo  found. 

q  ■  the  rate  of  e  as  found  by  the  device.    The  output  of  the 
mechanism  is  sin"^"  Q. 

R,  L,  S  are. positive  constants  depending  on  the  gear  ratios, 

etc.  in  the  machine. 
The  mechanism  therefore  acts  like  an  R,  L,  C  circuit  in  which 
the  differential  inductance  is  a  function  of  the  current, 

v  1  -  q2 

The  system  can  be  critically  damped  for  differential  displace- 
ments near  at  most  two  values  of  the  current. 
Omitting  the  effect  of  backlash,  the  system  is  stable  for  any 
initial  conditions  whatever,  with  a  linear  forcing  function, 
e  s  At  +  fl.    It  will  approach  asymptotically  and  possibly  with 
osoillation  a  position  where  q  is  proportional  to  e.     An  error 
function  can  be  found  which  decreases  at  a  rate  -R  (q  -  qQ)2 
4o  being  the  asymptotic  value  of  q. 

If  the  system  is  less  than  critically  damped  ordinary  gear 
play  type  of  backlash  can  and  will  cause  oscillation.  This 
includes  play  in  gears,  aaaers,  lead  screws,  rack  and  pinions 
and  looseness  of  balls  in  the  integrator  carriages.     The  oscilla- 
tion is  not  unstable  in  the  sense  of  being  erratic,  or  growing 

-  3  - 

without  limit,  but  is  of  a  perfectly  definite  frequency  and 
amplitude.     This  type  of  backlash  acts  exactly  like  a  peculiar 
shaped  periodic  forcing  function.    Approximate  formulas  for 
the  frequenoy  and  amplitude  of  the  oscillation  are 




/s2  I   UoLd  -A)2 


^  and  B2  being  the  amounts  of  backlash  in  the  two  driven  shafts 
as  measured  in  a  certain  manner. 

8.  elastic  deformations  of  shafts  and  plates  can  be  divided  into 
two  parts.    .One  is  exactly  equivalent  to  the  gear  type  of 
backlash  and  may  be  grouped  with  B]_  and  B2  above.    The  other 
has  the  effect  of  altering  the  parameters  R,  L,  S  of  the  cir- 
cuit and  also  adding  higher  order  derivatives  with  small  co- 
efficients.   This  will  slightly  alter  the  time  constant  and 
the  natural  frequency  of  the  system. 

9.  The  manner  in  which  the  arcsin  function  is  obtained  seems  to 
me  distinctly  disadvantageous  to  the  operation  of  the  system 
for  a  nnmber  of  reasons,  chiufly  since  to  eliminate  backlash 

oscillation  it  requires  high  overdamping  near  q  ■  0  and  this 
slows  down  the  response  for  low  target  speeds. 
10.     The  general  problem  of  rate  finding  and  snoo-hing  is  con- 
sidered briefly  from  two  angles  -  as  a  problem  in  approxi- 
mating a  certain  given  transfer  admittance  ana  as  a  problem 
in  finding  the  form  of  a  differential  equation.     The  first 
method  based  on  a  linear  differential  equation  leads  to  ten- 
tative designs  whicn  I  think  would  be  an  improvement  over  the 
present  one.     The  second  method  indicates  the  -ossibility  of 
still  more  improvement  if  non-linear  equations  can  be  satis- 
factorily analyzed. 


general  Considerations.     The  deflection  mechanism  is  a  aevice  de- 
signed to  find  5i  mechanically  from  the  formula 

•  in*!  =  Sa^  tp 

having  cne  shaft  whose  rate  of  turning  is£a  and  another  whose 
angular  position  is  Jj>  t?f  giving  c-t  as  the  position  of  a  shaft. 
The  system  is  also  supposed  to  smooth  out  small  errors  in^a* 
The  mechanism,  as  actually  constructed,  is  shown  in 
Figure  1.     By  a  rearrangement  of  adders,  it  may  be  drawn  as  shown 
in  Figure  2.     incidently,  the  device  of  rearranging  and  combining 

adder  units  is  frequently  useful  in  studying  these  systens.  In 
this  case  it  both  clarifies  the  physical  operation  and  simplifies 
the    mathematical  analysis.     The  box  IV  on  the  right  of  Fig.  1 
represents  two  adders  wigh,  essentially,  a  common  shaf t.  The 
output  is  equal  to  the  sum  of  the  inputs  with  the  indicated  signs 
prefixed.     A  variable  associated  with  a  shaft  represents  the  angu- 
lar position  of  that  shaft  unless  specifically  stated  otherwise. 
Gears  art  omitted  f rom  t he  diagram  but  included  as  coefficients 
in  the  equations.     It  may  also  be  worthwhile  to  point  out  that  the 
best  method  of  setting  down  the  equation  of  such  a  system  is 
usually  the  following: 

1.  Considering  oniy  the  integrators  and  function  Lie-vices, 
label  the  various  snafts  UBing  the  minimum  number  of  variaoles, 
Yiorkin^  backward  from  driver  to  driving  snafts.     Thus  if  the  out- 
put of  an  integrator  is  labeled  z,  its  displacement  is  i  (assuming 
constant  disk  rate).     If  the  output  of  an  x  to  In  x  gear  is  sin  u, 
its  input  is  esin  u  .    Marking  backwards  rives  the  differential 
instead  of  the  integral  form  of  the  equation. 

2.  Hew  concentrate  on  the  adders,  grouping  together  cs 
many  as  possible,  and  write  the  equations  of  constrain*.  These 
will  be  the  equations  of  the  system. 

I  find  the  use  of  electrical  analogues  very  useful  in 
under  standing  tnese  devices  and  have  sed  throughout  a  notation 
which  emchasizes  this  idea. 

As  the  maohine  is  drawn  in  Fig.  2,  it  consists  of  threa 
independently  operating  units.    The  output  of  the  first  i3  a 
single  shaft  serving  as  input  to  the  second,  the  output  of  the 
second  a  single  shaft  feeding  the  third,  and  the  output  of  this 
being  a  shaft  used  as     S 3, 

The  operation  is  ruughly  as  follows:     Integrator  I 
multiplies  its  disk  rate  oy  its  displacement,  so  that  the  rate 
of  turning  of  its  output  is  y  =  ^0  tp£a»    The  actual  position  of 
this  y  shaft  can  carry  no  significance.    It  is 

y  ■ 

p.    tp2a  dt   +•  y0 

a  variable  which  cepencs  on  the  entire  previous  history  of  tne 
sighting  telescopes  to  say  nothing  of  possiole  integrator  slippage. 
At  two  different  tisas,  vrith  a  target  at  the  same  position  and 
speed,  this  shaft  would  have  entirely  different  angular  nositions 
but  the  same  rate  of  turning. 

The  output  of  integrator  I  feeds  into  the  middle  uart 
cf  the  system  which  is  exactly  the  rate  finder,  of  saost  older 
directors.     This  part  of  the  divice  seems  to  me  net  only  super- 
fluous but  actually  detrimental  to  the  operation.     It  is  equiva- 
lent to  an  R,  L,  circuit  (Fig.  3)  with  impressed  voltage  y  and 
cutout  x,  che  voltage  across  the  inductance 

3.    A  small  response  h(t)  for  the  function  g(t). 

High  frequencies  in  g(t)  appear  practically  un- 
diminished  and  in  the  same  pnase  in  h(t)  since  the 
impedance  is  high  compared  to  R. 


-  %  t 

In  ^ 

1a  t   £e  +  h(t) 

In  adder  III,  x  is  added  to  y  in  equal  proportions  to  give  e. 

e  _  y  +   ±1  A  +•  K  e    Ll    +  h(t) 

As  vre  pointed  out  above,  y  already  contains  an  irrelevant  additive 
constant,  so  the  addition  of  another,  gj"  A  which  happens  to  be  pro- 
portional to  the  target  rate  is  of  no  possible  significance.  The 
term  K  e         '    certainly  is  only  detrimental  being  an  unwanted 
transient.    For  a  time  I  thought  that  the  reason  for  the  middle 
part  of  the  machine  was  the  final  term  h(t).    For  hi^h  frequen- 
cies this  is  approximately  g(t),  and  might  be  used  to  buck  out 
these  high  frequency  following  errors,  much  as  was  done  in  some 
early  radio  circuits  to  recuce  a-c  hum.    However,  a  study  of  the 
design  diagrams  shows  that  the  two  error  functions  are  actually 
in  phase  as  I  have  indicated  in  the  equation,  so  that  these  high 
frequency  errors  are  added,  making  the  situation  worse.    £ven  if 
the  phase  of  x  were  reversed  on  entering  adder  III,  I  think  it 

doubtful  whether  the  presence  of  this  part  of  the  system  -would  be 
justifiable.     It  would  be  necessary  to  show  that  tne  frequencies  • 
were  high  so  that  the  two  actually  did  cancel,  and  also 
that  the  disadvantages  of  the  transient  term  did  not  overcome  the 
advantages  obtained.    Note  that  the  middle  part  can  function  in 
no  way  as  a  rate  finder.     The  ri^ht  hand  part  of  the  machine  does 
its  own  rate  finding  as  we  will  see,  and  the  rate  found  by  the 
middle  part  could  not  possibly  be  used  because  of  the  undetermined 
constant  in  y. 

•e  prooeed  now  to  the  third  part  of  the  machine  which 
is  the  major  concern  of  the  study.    Concentrating  on  the  adder  IV, 
the  equation  of  the  system  is  obviously 

L  -|  sin"1  q=e-3q-Rq 


5  qt  iiL  L  q  =  e 

This  is  the  equation  of  a  series  R,  L,  C,  circuit  with  the  in- 
ductance a  function  of  the  current  passing  through  it.  Induc- 
tance     may  be  defined  by  the  Lagrangian  equations  or  by 

-  10  - 

and  it  is  clear  from  the  above  equation  that 

A  i  ■  l  sin"1  i 

or  A  .  L  Bia  1 

This  function  varies  as  shoim  in  figure  4.    For  our  work,  however 
a  more  useful  parameter  is  what  is  sometimes  called  the  differential 
inductanoe       which  nay  be  defined  by 

so  that  in  our  case 

This  inductance  is  useful  when  we  have  an  equilibrium  current  qg 
and  are  considering  the  effect  of  small  variations  about  this  equi- 
librium.   Omitting  second  order  terms  the  system  will  be  equivalent 
to  one  with  constant  R,  L,  G  parameters,  the  inductance  being 
taken  as  L^.     The  variation  of  L-q  with  current  is  snown  in  figure  5. 

The  action  is  the  opposite  of  that  of  a  "swinging"  choke  where,  be- 
cause of  saturation,  the  differential  inductance  decreases  with 
large  currents. 

The  mechanical  idea  behind  the  operation  of  this  system 
is  quite  simple.    Suppose  shaft  e  to  be  turning  at  a  constant  rate. 
The  system  will  be  in  equilibrium  if  the  displacement  of  integrator  V 
is  such  as  to  make  its  output  feeding  into  the  adder  equal  and  op- 
posite to  e,  and  the  displacement  of  integrator  VI  at  zero.  Under 
these  conditions,  shaft  q  measures  the  rate  of  e  and  shaft  V,  the 
output  of  the  device,  the  arcsin  of  this  rate,     if  the  rates  are 
not  correct,  the  adder  changes  the  second  derivative  shaft  in 
such  a  direction  as  to  equalize  the  rates.    The  q  shaft  serves  as 
a  danper  to  prevent  continual  oscillation  aoout  the  equilibrium 

-  12  - 

MATHEMATICAL  THEORY  (Backlash  not  Present) 

Differential  Operation 

If  e  is  turning  at  a  constant  rate  and  the  system  is  at 
equilibrium,  and  then  a  small  differential  disturbance  is  applied 
to  the  system,  it  will  clearly  respond  very  nearly  like  an  R,  L, 
C,  circuit  with  constant  parameters,  the  inductance  used  being  the 
differential  inductance  for  the  equilibrium  current 


y'i  -  41 

Such  a  system  has  a  tine  constant  of 

2  Leff 


T  x 


tyl  -  q| 

It  is  critically  damped  if 

H2  -  4  Leff  S  ■ 

4L  S 

which,  of  course,  only  occurs  at 

16  i/ 

For  values  of  q  greater  in  absolute  value  than  this,  the  system  is 
oscillatory,  for  values  less,  over damped. 

-  13  - 

Proof  of  General  Stability  -with  Linear  e 

In  proving  the  stability  of  this  system,  I  have  used  a 
method  -which  may  be  new  in  some  respects.     It  was  suggested  by  the 
fact  that  in  a  non-dissipative  mecnanioal  system,  the  potential 
energy  U  is  a  minimum  at  a  point  where  the  system  is  differentially 
stable,  and  the  method  is,  in  a  sense,  a  generalization  of  that 
criterion.  It  is  not,  however,  limited  to  differential  stability, 
or  to  non-dissipacive  systems.     Since  the  method  may  be  of  use  in 
other  investigations  of  this  type,  I  will  first  describe  it  in 
general  terms. 

Suppose  we  have  a  differential  equation  system  in  which 
n  variables  and  derivatives  may  be  specified  independently  in  the 
initial  conditions.  7<e  will  say  that  the  system  is  stable  for  all 
initial  conditions  and  all  driving  functions  if  any  two  solutions 
of  the  system  with  the  same  driving  funoiions  approach  each  other 
in  the  sense  that 

Lim      2       \x±  -  y±\    -  o 
t  ->co  i  -  r 

where  xj^t),  x2(  t) . .  .x^t)  is  one  solution  and  yx(t)  ...yn(t)  the 
other.     If  this  limit  is  zero  for  certain  types  of  driving  functions, 
we  will  say  the  system  is  stable  for  these  functions. 
Thereomi     If  a  continuous  function  Q(x1...zn,  y1...yn,t)  can  be 
found  having  the  following  properties  ' 

X.     Q>0  for  all  x±,  yt,  t,  the  equality  holding  if  and 
only  if  x±  a  y±. 

-  14  - 

2,  dQ         at  all  times,  when  the  x^  and  y^  are  solutions 
of  the  system,  with  the  same  driving  function. 

3.  It  is  impossible  for  Q  to  remain  indef  initelj>A  ^  0. 
Then  the  system  is  completely  stable. 

For  the  function  Q  is  non- increasing  but  always^  0  and 
must  therefore  approach  a  limit  A>0  as  t  ~>oo  ,  but  by  5.  A^O 
is  impossible,  hence  A  =  0,  and  each  Ix^-y^/  — 5>0. 

Conversely,  it  oan  be  shown  that  if  only  a  single  forc- 
ing function  is  involved,  and  the  system  is  stable  for  this  funo- 
tion,  a  Q  exists  of  the  type  described. 

Roughly,  the  method  is  to  find  a  "distance"  or  "error" 
function  Q  between  two  solutions  which  is  zero  only  when  the  so- 
lutions are  identical  and  which  always  decreases. 

As  an  example  of  this  method  it  is  easy  to  prove  the 
complete  stability  of  the  ordinary  R,  L,  C,  circuit  with  constant  . 
parameters  without  solving  the  equation.    The    differential  equation 

"  Sq  +  R$  +  L    q    =  e 
and  we  choose  q  and  \  as  coordinates.    Let  two  solutions  be  q1# 
q^and  q2,  q2«nd  consider  the  funoticn  Q  =    y  (qi-q2)2+  £  (qx-qg)  . 
Condition  1  is  obviously  satisfied.  How 

||-  SCqi-qgXqi-qg)   +    L(q^-q'2)  (aj-qg) 

-  -r  (ii-42)2£o 

-  16  - 

.    S  (n  -  At  -  3  .  EA)2 

obviously  the  minimum  of  Q  with  respect  to  q  occurs  at 

At      B  -  SA 

q  -  s  +  s 

Also  •  a 

q  -  s 

ciQ  =  L  

y  1  -  q 

which  vanishes  only  for  q'f  It  is  readily  verified  that  this 
is  a  minimum,  and  that  (J  is  zero  at  this  point  for  any  t.  Now 

dt    oq  » 

i  -  s 

5S(q-4-|  +  §)0..4)>L 

S      S      3-  ~ 



q  s  ^ 

-  (At  t-  3  -  3  q  -  R  q) 

if  q  rjid  q  satisfy 

Sq  f  Bq  +       L       >  At  +-  B. 

V  1  -  q2 

-  17  - 


d|  «  (Sq  -  At  -  B  f  J£)     (q  -  ±) 
~   (4  "  -f)Ut  +  3  -  Sq  -  Rq) 

■  -E  (q  -  |)2  *  0 

Note  that  this  rate  is  identical  with  that  found  in  the  linear  case. 
Incidentally,  it  was  by  working  baokward  from  this  rate  that  a 
suitable  function  Q  was  first  found. 

For  Q  to  approaoh  a  limit  K>0,  it  is  necessary  for  q 
to  approach  zero,  and  q  therefore,  to  approaoh  a  linear  function 
of  t  differing  by  a  constant  from  its  equilibrium  value.    But  from 
the  original  differential  equation  q  must  approach  a  oonstant  different 
from  zero,  which  contradicts  4^0.    This  does  not  however,  quite  com- 
plete the  stability  proof  due  to  a  certain  meohanical  peculiarity  of  the 
system.    Let  us  plot  the  equilevel  lines  of  Q  against  axes  X  *  (q  -  At 
-  |  and  Y  «  q.    (Figure  6). 

The  x  io  sin  x  gear  in  tne  ac-cuai  mecnanisn  has  a  limited 
movement,  and  is  prevented  f rem  going  too  far  by  e  slip  clutch  and 
stop.     If  '  q        Z    1,  the  stop  prevents  ;qj  from  increasing  anymore. 
The  original  equation  is  replaced  by 


until  the  pressure  on  the  stop  reverses,     oo  far  we  have  snowi  that 
under  the  original  equation  Q  always  aecreases.    In  terms  of  our 
plot  this  means  that  if  we  start  a  solution  inside  the  curve  marked  C, 
the  solution  will  certainly  converge  to  the  equilibrium  position,  for 
the  solution  can  never  "escape"  from  C  and  hit  one  of  the  two  lines 
1  =  r  K,  where  the  differential  equation  changes.    ^7hen  we  are  not  on 

-  19 

one  of  these  lines  a  solution  will,  in  fact,  spiral  inward  in  the 
clockwise  sense,  as  maybe  seen  by  writing  the  differential  equation 
in  the  form 

(n  -    i*      B      3A,       R  As      _        L  a 

Consider  the  s  igns  of  5  and  (q-A/s)  in  the  four  quadrants  about  the 
equilibrium  position.     In  I  for  example  (q-A/S)  >  0  and  the  X  coordl- 
nate  of  a  solution  must  increase  with  tj  q  <  0  so  q  must  decrease, 
giving  a  clockwise  sense  to  the  notion.     Similarly  the  other  quadrants 
may  be  verified.    Some  of  the  solutions  starting  out3ide  of  C  will  hit  one  of 
the  lines,  but  the  solution  will  still  be  stable.     It  is  easy  to  show, 
by  a  study  of  the  signs  of  the  variables  and  their  rates  that  a  solu- 
tion can  only  hit  the  upper  line  to  the  left  of  the  point  with 

coordinates  I  =  1  (|  -  £)  and  Y  .  K,  and  that  if  one  does,  it  will 
nove  along  the  lins  to  the  right  until  it  reaches  P-^  and  then  return 
to  the  original  equation.        similar  situation  holds  for  the  lower 
line.     If  we  should  start  a  solution  on  the  upper  line  to  the  right 
of  Pj  it  would  leave  the  line  immediately.    The  solution  is  always 
horizontal  (i.e.  q  ■  <))  on  tne  line  through  P^,  the  equilibrium 
point  and  Pg. 

If  R  ■  0  the  function  Q  is  constant  since  £S  ■  o  &nd 


therefore  the  solutions  of  the  equation 
Sq         L      q  ■  At  +  B 

-  20  - 

are"  the  equilevel  curves  in  Figure  6. 

I  have  attempted  in  several  different  -ways  to  generalize 
this  proof  for  arbitrary  input  functions  e(t),  but  so  far  have 
no  completely  rigorous  proof,     dowever,  some    of  the  arguments 
come  so  near  as  to  make  me  almost  certain  of  oomplete  stability. 
It  can  be  shown,  for  example,  that  two  different  solutions  with 
the  same  e(t>  cannot  definitely  divergei  i.e.    |qj>-q2|  f  |  |i-4g  \ 
cannot  become  and  remain  greater  than  some  positive  constant 
(assuming  e  and  e'  bounded).    Also  if  two  solutions  get  close 
together  (with  respect  to  both  q  and  q),  they  will  certainly  con- 

The  Effect  of  Backlash 
— — — — _____ 

In  order  to  understand  how  backlash  can  cause  oscillation, 
let  us  first  consider  a  much  simplified  case.     Suppose  we  have  a 
second  order  linear  system  which  is  less  than  critically  danmed  with 
no  backlash  (Figure  7). 

Sq  -f-  R  4  +  Lq-e 
If,  at  t  "  0  we  suddenly  impress  e  -  E  (constant)  on  the  system 
(q  -  \  =  0),  the  response  is  a  damped  oscillation  (Figure  8). 

-  21  - 

Now  in  the  mechanical  system  there  are  only  two  rf  i 

oniy  two  driven  shales 

811(1  B»  and  backlash  only  affB(.+.  C  • 

or  thes       p  dirCCtly)  thS  °Pe^ion 

of  these.    ,robably  tne  gr 

^  18  W  the  adder  av«+o„ 

driving  shaft  A.    Let  us  assume  for 

assume  for  a  moment  that  this  is  the 
only  backlash  present  and  that  its  act. 

shaft.  18  "  f°ll0W8<  ™*» 

shaft  a  reverses  airection  ■  ( i.a    whfln      .  n/ 

U.e.  when  q  -  0)  there  i8  a  Bhor± 

—  -  *  ^s  w  h01d„        ~  ~" 

shaft  ■  ^  &S  MUUrfld  from  the  , 

^  Xt  18  that  the  response  of  the 

lash  i.  *h  SyStem  ^  bac^- 

lash  is  the  same  as  the  response  would  be  if  the 

lash  and  at  the  ti  -  "°  ^ 

^  ^  ^  '™  <™sly  Creasing  - 
aoout  to  increase)  we  turn  the  e  shaft  B 

.     w      f  8haft  "Bl  «ni  in  such  a  way 

8  ^  *  — ^ing  this  turning. 

snarly  at  the  nest  reversal  we  L±ve  .  . 
mcre,ent  Bj  keeping  J  constant  through  th- 
in n.v,  6         8  Peri°d  0f  °acklash. 
In  other  words,  the  res  onse  i8  that  ^ 

that  01  a  V-tea,  without  back- 
lash on  which  we  impress  as  f 

&    uxi0T;ion  a  wave  wnich  is 

aoout  as  shown  in  Figure  9. 

-  22  - 

If  the  periods  of  backlash  are  comparatively  short,  the  small 

connecting  portions  (actually  quadratic  polynomials  in  time) 

will  have  little  effect  on  the  response.     That  is,  we  can  assume 

a  square  topped  wave  with  little  error  in  $  or  q  especially,  due 

to  the  smoothing  operation  of  the  integrators  (or,  said  another 

way,  cue  to  the  high  impedance  of  the  circuit  to  ;  frequencies). 

How  suppose  that  there  is  a  certain  amount  of  backlash 

in  shaft  B.     The  action  of  this  is  to  cause  the  carriage  of  the 

upper  integrator  to  remain  stationary  for  a  small  period  when 

q  I  0.     The  same  effect  would  be  achieved  if,  at  tnis  time,  we 
suddenly  impressed  on  e  a  pulse  wnich  held  the  lower  integrator 
at  fero  and  kept  changing  e  at  sucn  a  rate  as  to  keep  the  lower 
integrator  there.     lie  keep  the  integrator  at  zero  long  enough  so 
that  its  output  \70uld  have  turned  an  amount  equal  to  the  backlash 
in  B  and  then  suddenly  return  it  to  its  proper  value,     -his  means 
that  the  area  of  the  pulse  must  equal  the  backlash.     The  shape  of 
this  pulse  would  be  a  linear  function  of  tine,  but  here  again  it 
is  not  highly  significant. 

The  entire  system  may  thus  be. replaced  by  one  which  is 
free  of  backlash  and  subject  to  a- driving  function  of  the  type 
shown  in  Figure  10,  wnere  B±  is  the  backlash  in  A  as  measured 

23  - 

from  e  and  Bg  is  the  amount  in  B  as  measured  from  e  (in  the  sense 
that  if  e  covers  an  area  B2,  shaft  B  moves  an  amount  equal  to  itB 
backlash) . 

It  is  easy  to  see  from  our  diagram  that  this  forcing 
function  is  in  the  correct  phase  to  sustain  the  oscillation 
of  decay. 

Tne  fundamental  component  of  this  forcing  function  is 
easily  lound.     .Ye  have 


Aj_  =  y        6  sin  — t^.  dt 



e  may  be  split  into  a  sum  -  one  term  for  the  square       wave  and 
oae  for  the  pulse-like  32  part.     The  i^2  pulse  is  all  concentrated 
near  the  center  of  the  sine  wave  where  it  is  nearly  unity.  Jfenoe 


AX  -  |     2      h.  sin  2*t  dt  4B2 

2  X  r|» 

^  o 

=  f-l    4  f  o  B2 


The  period  T  of  this  oscillation  is  the  natural  damped  period 
of  the  system,  to  within  a  small  error  of  size  comparable  to  the 
length  of  tire  during  which  backlash  is  effective.    Hence  itw 

-  24 

frequency  is  approximately   

t  -  i  fi  T2 

and  the  magnitude  of  the  fundamental  component  of  the  response  q 

2£i      4  f 0  B2 

I  .   

i  R2       (coqLd-  i  \Z 

Providing  the  quantity  f!l     4  foB2  is  8111611 »  the  d*' 
flection  mechanism  will  behave  linearly  about  its  equilibrium 

position  and  the  above  formulae  would  approximately  hold.  If 

|qj    /    0  the  equilibrium  value  of  inductance       L  would 


probably  be  as  good  as  any  to  use  since  the  differential  inductance 

is  greater  on  one  side  and  less  on  the  other.    At  4  -  0  the  inductance 

is  greater  on  each  side  and  a  somewhat  higher  value  should  be  used, 

depending  on  2B1       4f0B2»    If  tne  8ystem  is  more  tnan  critically 

damped,  q  may  or  may  not  have  an  inflection  point  depending  on  the 
initial  conditions.    If  they  are  such  that  the  driven  shafts  do 
not  reverse  backlash  cannot  take  effect  and  there  should  be  no 
oscillation.    However,  if  they  do  reverse  once,  the  system  may 
receive  the  equivalent  of  a  "kick"  in  such  a  direction  as  to 
cause  another  reversal  and  so  on,  so  that  oscillation  is  set  up. 
ihis  problem  has  not  been  very  well  decided  but  if  this  happens, 
the  amplitude  formula  above  should  still  hold,  while  the  frequency 
formula  will  not. 

-  25  - 

The  question  of  "spring  backlash"  i.e.  undesired  effects 

due  to  elastic  deformations  of  shafts  and  mounting  plates  has  been 
raised.    Acoording  to  Hooke's  Law  the  angular  strain  in  a  shaft 

is  proportional  to  the  applied  torque.     This  torque  in  a  shaft 

the  first  term  wnose  si^n  is  that  of  -x1,  being  due  to  a  coulomb 
friction  load,  the  second  to  a  viscous  friction  load  and  the  third 
an  accelerating  torque. 

It  is  clear  that  the  coulomo  friction  term    I,  can  be 
combined  with  tie  ordinary  gear  type  backlasn  treated  above,  and 
acts,  therefor s,  like  a  periodic  forcing  function.     The  effect  of 
the  other  terms  is  ^uit.;  different,  their  presence  causes  small 
changes  in  the  parameters  and  6  of  the  circuit  and  also 

adds  higher  derivatives  to  the  equation.  Let  us  consider  only  the 
spring  in  the  shafts  feeding         L         q  (i.e.  assume  q  driven 

whose  position  is  x(t)  can  probably  be  very  well  approximated  by 

an  equation  of  the  form 

I  =  ±\  +■  2g  ac«   t  K3  x" 

(Sq  -  P1  q  -  Pz  q) 
(R  4    -  fx  q  -  ig  «') 


-  26  - 

Sq   +  (R-Pi)  q 

'F2  -    *1.  1 

-    r2  V  =  (e-  «x  i  -  a2e)  -  eX(t) 

Spring  in  the  drive  to  q         a  similar  effeot  although 
complicated  by  the  non-circular  sine  gears. 

If  e  is  a  linear  function  of  t,  so  is  e^  and  the  forcing 
function  thus  contains  nothing  to  create  a  sustained  oscillation. 
The  left-hand  side  differs  only  by  small  quantities  from  the  ideal 

Sq   -  Sq   -  _Ji__        q  =  ex 

,  l-q> 

and  will  therefore  surely  approach  the  solution 

Thus  we  see  that  the  "spring  type"  of  backlash  cannot  cause  sus- 
tained oscillation  as  the  ;,gear"  type  of  backlash  can.  However, 
if  the  gear  type  is  present,  the  spring  type  can  aid  oscillation 
by  reducing  the  damping,     it  may  be  necessary  to  overdamp  in  some 
cases  in  order  to  get  an  effective  critical  damping. 

It  should  be  pointed  out  that  the  gear  type  of  backlash 

may  not  be  quite  as  simple  as  we  have  assumed,  particularly  in  the 
L  a 

shafts  driving       q 9     If  the  integrator  carriage  load  is  large 
aanpared  to  the  friction  loads  in  the  adders  and  gears,  then  we 
are  probably  justified  in  assuming  that  gear  pressures  in  the 
drive  only  reverse  when  the  driven  shaft  reverses,     however,  if 

this  is  not  the  case,  a  backlash  effect  can  easily  take  place  at 
other  times,  for  example  -when  one  of  the  shafts  feeding  the  adder 
reverses,  without  necessarily  reversing  the  driven  shaft  \ 

The  situation  could  become  quite  complicated,  the  equivalent  input 
function  containing  several  different  sized  steps  occurring  at 
different  times,    however,  the  fundamental  frequency  should  Btill 
be  approximately  the  natural  damped  frequency  of  the  system,  pro- 
viding the  backlash  effects  are  small  and  occur  only  during  a  small 
fraction  of  the  time. 

The  fact  that  backlash  can  cause  a  sustained  oscillation 
leads  to  a  cfitioism  of  the  design  of  the  mechanism,  in  particular 
to  the  metnod  whereby  the  ercsin  function  is  obtained.    Note  that 
reducing  the  amount  of  gear  backlash  4f 0B2  will  reduce  the 

amplitude  of  oscillation  proportionately,  but  apparently  the  only 
way  to  eliminate  it  completely  is  to  at  least  critically  damp 
the  system  for  all  equilibrium  points,  so  that  the  shafts  do  not, 
in  general,  reverse  direction.     In  the  deflection  mechanism  as 
it  stands,  this  would  be  distinctly  disadvantageous,  for  if  we 
critically  damp  at  the  maximum  values  of  jijj,  (the  governing 
points)  the  system  will  be  much  over-damped  near  Q  •  0,  and  in 
fact  for  most  values  of  4  due  to  tiie  shape  of  the  induct anoe 

Another  related  argument  against  the  manner  of  getting 
the  arcsin  is  that  the  repponse  to  high  frequency  error  functions 
depends  on  the  value  of  q.     It  seems  to  me  that  the  treatment  of 
error  functions  should  be  independent  of  thet);arget  speed  - 

-  28  - 

what  is  best  for  one  will  be  best  for  another  -  since  the  predictlo: 
error  we  can  tolerate  is  an  absolute  quantity,  not  dependent  on  the 
target  speed.    There  may  be  some  objection  to  this  argument  on  the 
groundi  that  at  higher  target  speeds  the  error  funotion  is  apt  to 
be  larger,  and  hence  the  circuit  should  have  a  larger  impedance, 
but  even  so  it  would  only  be  accidental  if  the  peculiar  variation 
introduced  by  the  sinegear  was  anything  like  an  approximation  to 
the  desired  variation. 

Finally,  a  minor  argument  against  the  position  of  the 
sine  gear  is  that  the  equation  becomes  so  difficult  to  handle 
mathematically.    A  design  of  this  type  must  be  largely  intuitive 
or  experimental  -  there  is  not  much  chance  of  ohoosing  the  con- 
stants for  the  best  operation  by  a  mathematical  formulation,  or  of 
determining  to  speed  of  response  etc  analytically. 

These  difficulties  might  be  avoided  in  several  ways.  The 
arcsin  might,  for  example,  be  introduced  as  in  Figure  11. 

No  doubt  the  reason  this  was  not  done  was  because  -with  [  \{  near 
1,  running  the  sin  x  gear  backward  is  not  mechanically  practical, 
the  gearing  up  ratio  being  too  great.    This  objection  could  be 

-  29  - 

overcome  in  two  ways  -  either  a  new  gear  K  arcsin  x  to  x  (k  large) 
could  be  used  and  the  parameters  R,  L,  3  all  decreased  by  a  factor 
of  k  (or  the  integrator  disks  might  be  speeded  up  in  suitable 
ratios),  or,  if  this  were  not  mechanically  feasible,  a  rapid  re- 
sponse servo  mechanism  could  be  introduced  in  the  output,  Figure  12. 

This  system,  can,  by  the  way,  be  solved  in  closed  analytic  form 
when  i  is  a  constant,  and  reduced  tc  a  quadrature  in  any  case. 
The  essential  feature  of  this  circuit  is  that  the  functions  of 
rate  finding  and  smoothing,  and  of  taking  the  arcsin  have  oeen 
isolated.     ,ach  part  can  be  designed  to  do  its  own  job  the  best 
without  comoromise.     It  may  be  noted  that  the  arcsin  circuit 
aoove  also  performs  a  smoothing  operation  which  depends  on  target 
soeed.     Sy  suitable  choice  of  the  parameters  we  can  make  this 
larr;e  or  small  fs  T.-e  desire. 
The  ideal  Hate  Finder  aaa  Smoother 

Let  us  consider  the  problem  of  rate  finding  and  smooth- 
ing from  a  general  standoom^  and  as*  what  mathematical  opera- 
tion a  macnine  snould  perform  to  act  as  zhe  "best  possible*  rate 
finder.     Cf  course,  rni  s  question  has  many  answers,  depending 
chiefly  on  what  assumptions  we  make  as  to  the  input  function, 


-  30  - 

and  what  mathematical  limitations  we  put  on  the  machine.  Tile 
shall  assume  throughout  that  the  input  function  e(t)  consists  of 
a  series  of  linear  parts  with  cunrea  connecting  portions  and  with 
a  small  superimposed  error  function,  and  that  we  only  desire  the 
rate  during  (that  is,  some  time  after  the  start  of;  a  linear  part. 
In  this  section  we  assume  there  ar;  no  limitations  whatever  on  the 
machine  -  that  we  can  build  a  machine  tc  perform  any  operations  we 
can  ascribe,  in  particular  those  a  mathematician  might  use  tc 
solve  the  problem.    How  there  is  considerable  experimental  and 
theoretical  justification  to  the  t -eory  that  the  best  way  to  fit 
a  curve  of  a  biven  type  tc  a  set  of  points  subject  to  an  observa- 
tional error  is  in  the  least  square  sense.     If  we  assume  this  tc 
be  true  in  our  case,  and  attempt  tc  fit  e  straight  line  to  the 
last  a  seconds  before  tj  of  the  curve  e(tj,  we  must  minimize  the 


I  s  e  -  (At-B)     2  dt 

with  respect  to  A  and  B.    The  quantity  a  represents  the  length  of 
the  curve  used  in  the  fitting  process,    ne  would  like  to  use  as 
much  of  the  curve  as  actually  represents  a  linear  segment  to  get  the 
best  accuracy,  but  certainly  no  more.    A  person  doing  the  curve 
fitting  could  look  at  e(t)  and  see  fairly  well  where  the  curve 
showed  a  real  tendency  to  depart  from  linearity,  and  select  accor- 
dingly.   Mathematically  it  could  be  done  as  follows.    Suppose  the 

31 V 


standard  deviation  of  the  error  is   6  and  that  errors  of  more  than 
say  4cr  are  almost  certainly  due  to  a  significant  departure  from 
linearity  in  the  curve.    We  oould  choose  a  such  that  it  is  as  large 
as  possible  without  making  the  error  I  e-(At'B)  |      (A,  B  chosen  to 
minimize  I)  tj-a  £r  t  ^  greater  than  4<f.    In  other  words  we  use 
as  muoh  of  the  curve  as  we  can  assume  linear  within  observational 
errors.    As  a  final  refinement  of  the  solution  it  might  be  desirable 
to  include  a  weighting  function  W(a.t)  in  the  integral  I,  weighting 
the  more  recent  values  more  heavily.    The  final  evaluation  of  the 
rate  is  then  the  value  of  A  given  when  we  minimise  the  funotion 

l(A,B.a)  8  re-(AttB)  J2  *(t,a)  dt 

u  t]_-a 

on  A  and  B,  a  fixed,  giving  A  and  B  as  functions  of  a,  and  then 
cnoose  a  as  large  as  possible  with 

|  e  -  (At+B)|  ±     K  C  tx  -  aftf 

This  solution  can  be  put  into  a  more  explicit  form, 
but  even  wnen  greatly  simplified  it  appears  that  it  would  be  quite 
difficult  to  carry  out  the  calculations  accurately  by  meohanioal 
means.     The  main  difficulty  is  that  apparently  such  a  machine  must 
be  caoable  of  remembering  exactly  the  past  history  of  an  arbitrary 
function,  e  or  something  derived  from  it.    The  only  methods  I  know 
Of  doing  this  are  quite  inaccurate,  or  else  very  complex,  and  it 
seems  likely  that  ^he  gain  in  mathematical  precision  of  the  above 


-  32  - 

formulation  -would  be  more  than  offset  by  a  loss  in  mechanical  pre- 

Differential  Analyzer  Types  of  Machines 

Tc  become  a  bit  more  practical,  let  us  now  confine  our 
attention  to  machines  of  what,  might  be  called  the  differential 
analyzer  type.     3y  this,  vre  mean  machines  constructed  of  a  finite 
combination  of  adders,  integrators,  and  function  elements  (e.g. 
non-circular  gears).     Two  shafts  e(t>  and  kt  enter  the  machine 


ana  ore  shaft  u(t)  leave  b  the  macnine.     It  can  be  shown  that  any 
such  system  must  satisfy  a  dif f erect ial  equation  of  the  type 

.     •  (n) 
*(q.q  ...  q     ,t)  =  e(t) 


u(t)  a  qU). 

First,  we  ask  what  can  bo  said  about  the  form  of  this  equation  to 
maJce  the  machine  act  as  a  satisfactory  rate  finder  in  our  sense. 

1.  ..ith  the  same  initial  conditions  and  the  same  e(t)  the 
macnine  snoula  certainly  resDond  the  same  independent  of 
the  Time  of  start,     hence  f  does  not  depend  on  t. 

2.  .lien  e  =    At  B  the  equation  must  have  an  equilibrium  solution 

q^  ^  ■    A  q(*  ^)  =  o 


q  =  At  e  • 

t  i 

i  i 


-  33  - 

If  i>l,  the  carriage  of  an  integrator  will  be  continuously  moving 
in  the  equilibrium  condition.     This  does  not  seem  practical  for  the 
initial  conditions  may  be  anything  depending  on  past  history,  and 
the  integrator  would  surely  go  off  scale  in  many  cases.  Obviously 
from  the  equilibrium  solution,  i  is  uot  G,  for  this  would  icply  a 
constant  equal  to  a  linear  function  of  time.     Hence  i  =  1  and 
q'  =  u(t). 

3.  Let 

f  U.y)  s  f  (x,y,0,  ...  0) 
jue  to  the  equilibrium  solution 

f  (At  -i-  C,  A)  =  At  -  3 

for  all  kt  J,  t. 

it  -  jH*.y)    A  -  A 

it         j  s. 

f  (x,y)  =  X  +  h  (y) 

"  tit 

4.  Assuming  f  is  fairly  "well  behaved",  we  have  near  q  »  q  =  ... 
■  q(n)  ■    p  (i.e.  near  equilibrium) 

f  ■  f  (q,  q,  0,  C,  ...   ,  0  ) 

q      *q  ^w 

■  q      h  (q)  *    a2  q^  ...      %  q 

34  - 

and  the  differential  operation  depends  on  the  coefficients 
&2  •••  a^and  h  (q).     As  this  differential  operation  should  not 
depend  on  t,  the  a^^  must  be  indepencent  of  q,  for  in  equilibrium 
q  cnanges  with  t.     Ihey  may  aepend  on  \  however  in  which  case  the 
differential  operation  depends  on  the  target  speed,  which  may  or 
may  not  be  desirable.     In  the  deflection  mechanism  this  is  the 
case,  ag  ■   1 


5.  iith  q  near  a  the  above  reduces  to 

f  •  q  f       q   —  a2q—  ...     —  a_  q(fl)-~  b 
where  a^  ■  h»  (a)  and  b  -  h(A}-Ah'(A).     To  eliminate  backlash  os- 
cillation the  roots  cf  this  equation  should  all  be  real  and  for 
stability  all  should  be  negative,  for  all  desired  A. 

6.  For  complete  stabil  ty,  there  are  no  doubt  further  requirements 
on  the.  form  cf  f.     This  problem,  however,  is  still  unsolved. 

The  above  are  only  requirements  on  the  form  of  f  so  that 
it  actually  does  find  a  satisfactory  rate.    To  find  the  best  form 
of  f  would  roquire  u.  very  elaborate  mathematical  analysis  if  possible 
at  all.  ■ 

If  we  restrict  our  machine  still  further  and  assume  a 
linear  differential  equation  with  cons-cant  coefficients,  it  is 
possible  to  ^ive  a  fairly  rational  analysis  leading  to  the  best 
values  of  the  coefficients.     The  question  is  this.    Given  the 

-  35  - 

»0  q      *i  q'         •••    »n  q(n)  ■  e 

What  values  of  the  coefficients  a0  ...  a^  give  the  best  rate- 
finding  smoothing  properties?    From  what  we  said  above,  it  seems 
that  the  characteristic  equation 

->  *n  P 

should  have  only  real  negative  roots  and  that  the  rate  found  will 
be  q'.    We  may  normalize  the  equation  by  assuming  a0  ■  1  so  that 
q*  is  actually  the  rate  and  not  merely  proportional  to  it.  In 
the  Heaviside  symbolio  notation,  we  have 

q'  = 

-V(V  1) 

writing  the  polynomial  in  the  factored  form.     The  b^  are  positive 
real  numbers  and  are  the  time  constants  in  the  transient  part  of 
the  response.    We  assume  the  b,  arranged  in  increasing  magnitude. 

Let  us  frsae  the  problem  as  follows.     Keeping  the  speed 
of  response  of  the  circuit  the  same,  what  values  of  the  b  give 
the  best  attenuation  of  the  error  function.    Of  course,  the  trouble 
appears  in  trying  tc  decide  what  we  mean  by  keeping  the  speed  of 
response  the  same,    ^'ne  answer  is  that  we  keep  the  maximum  time 
constant,  that  is  t_.  the  same.    This  may  be  partially  justified 
on  the  following  grc«ndsi    1.    For  "almost  all"  initial  conditions, 
the  term  A    e"-~  will  eventually  dominate  the  transient  response, 


-  oo 

the  other  terms  becoming  arbitrarily  small  in  comparison.  The 
only  time  when  this  fails  is  when  the  coefficient       happens  to 
come  out  zero. 

2.  In  the  worst  cases  (other  coefficients  small  in  comparison) 
the  bn  term  dominates  for  all  t,  and  the  machine  should  perhaps  be 
designed  with  the  worst  conditions  as  governing. 

3.  If  we  use  this  criterion,  it  is  easy  to  show  that  for  best  at- 
tenuation of  error  frequencies  all  the  b^  should  be  equal.  For 
the  magnitude  of  the  transfer  admittance  (e  to  q*)  is 

=   li  

2  2, 
V  (1-  bk      uj  ) 

which  is  obviously  smallest  when  each  bk  is  made  as  large  as 
possible,  for  all  frequencies.    That  is,  each  b^  ■  bn  the  maximum. 

Another  way  the  "same  speed  of  response"  might  be  in- 
terpreted is  in  terms  of  the  expected  area  under  the  transient 
time  curve.     Keeping  the  standard  deviation  of  this  area  con- 
stant seems  to  give  the  same  evaluation  of  the  bk  as  above  but 
there  are  certain  statistical  assumptions  in  my  proof  that  may 
render  it  invalid. 

If  the  characteristic  equation  has  real  roots,  it  may 
be  set  up  nicely  as  in  Figure  13. 

This  circuit  appears  to  have  an  advantage  from  the  backlash 
point  of  view  over  the  more  owvious  one  shown  in  Figure  14. 

S  7  3s 

,      ^ver     that  the  use  of  nonlinear  equation. 
It  seems  quite  possible,  however. 

+otr«      Consider  the  equation 
could  offer  a  real  advantage. 

S(q)  q  +  Kfl>  4  S  * 

•    *.  are  functions  of        When  the  system 
where  the  three  coefficxent.  ere  fu 

<  +  acts  approximately  likex 
i.  at  acts  a.  p 

3(0)  q    4-    K0)  q'    -  «  "  * 

be  adlusted  to  give  critical  aamp- 
^  these  three  constat,  could  beadj 

Man  of  the  error  function  frequencies.  On 
ing  and  a  good  attenuatxon  of  tw 

*  at  or    near  equilibrium,  q.  is 
the  other  hand,  when  we  are  not  at  or 

ki    different  from,  tero.    The  values  of  the 
(usually)  considerably  dxfferen* 

(usually;  w  to  a  very 

three  coefficients  could  be  adjust 

,  thuB  .pproaoh  the  equilibrium  posxtion  faster, 
rapid  response,  and  thus  appro 

,      v^ver    that  there  is  some  fundamental  error  xn 
It  is  possible,  however,  tnax 

"w  *  .«  attempt  to  do  this  would 
-      *„*    for  example,  that  an  attempt  w 
this  reasonxng,  ror  exwny 

necessarily  cause  oscillation. 
r  irrJ-»  j^SSS:  ^cuits. 

 ^T^T-  — ...  —  -  —  -  - 




Claude  J2.  Shannon 


The  so hematic  diagram  of  a  new  type  of  height  data 
smoothing  me onanism  Is  shown  In  /igure  1.    The  discontinuous 
height  data  e(t)  Is  fed  into  the  input  shaft  at  intervals. 
This  drives  a  differential,  oonneoted  also  to  the  ball  car- 
riage and  roller  of  an  Integrator  whose  disk  is  turned  by  a 
constant  speed  motor.    A  correcting  hand  wheel  and  the  inte- 
grator roller  feed  another  differential  whose  output  is  the 
output  of  the  device.    The  output  and  input  of  the  machine  are 
compared  through  a  differential  feeding  dial.    The  operator 
is  supposed  to  turn  the  handwheel  In  suoh  a  way  that  the  posi- 
tive and  negative  oscillations  of  the  dial  about  zero  are 

The  actual  height  of  the  target  h(t)  is  a  continuous 
function  of  time  and  we  may  assume  that  Just  after  each  read- 
ing e(t)  is  an  approximation  to  this*    Thus  h(t)  and  e(t)  might 
be  as  shown  in  Figure  2. 

The  shaft  y(t)  clearly  satisfies  the  equation 

(1)  7  ♦  £  7*  •  «(t)  . 
The  z  shaft  satisfies 

(2)  x(tJ  -  yit)  ♦  olt) 

and  the  dial  roads 

(3)  D(t)  -  e(t)  -  xUi  . 

During  the  period  between  height  readings  the  position  of  the 
alt)  shaft  is  constant,  aay  sit^),  the  reading  TiaJcen  at  ta, 

y  *;  y  -  9<V 

/  *  »  -a(  t  - 1_ )  <. 

y  -  ett^  +  ^  e  *       tn  -  t  v  tn  +  x 

Since  y  is  obviously  continuous,  it  will  follow  a  curve  con- 
sisting of  a  series  of  connected  exponentials,  each  with  the 
same  tine  constant,  1  •    The  continuity  of  the  ourre  implies 

-  ^n  9  "  *  e<  V  • 

assuming  the  intervals  between  readings  the  same,  aay  a  seconds, 
the  response  y  for  two  different  time  constants  m^a  -  In  2  and 
aua  «  In  10  are  snovm  in  Jlgure  3. 

Hie  larger  the  time  constant,  the  acre  the  lag  in 
response  of  y(t),  but  the  smoother  the  curve,     Jhis  may  be 
aeon  another  way:    the  o  to  y  system  is  equivalent  to  an  3, 
L  circuit  with  position  of  3hafts  analogous  to  voltage  as  shown 

In  ifigure  4.    with  M  small  y  follows  e  closely  including  the 


irregularities,    ./lth  <g  large  y(t)  is  smooth  compared  to  e  but 
lags  considerably. 

Movement  of  the  hand wheel  does  not  affeot  y(t)  but 
shifts  zltj  up  or  down  with  respect  to  y.    If  the  operator 
turns  the  uheel  to  give  equal  positive  and  negative  movements 
of  the  dial,  it  may  be  seen  that  in  the  "steady  state"  (say 
with  f(t)  -  at)  there  is  a  constant  lag  even  when  the  damping 
is  low  and  the  interpolation  nearly  linear.    In  this  case  the 
system  bridges  linearly  between  the  raid-ordinates  of  the  steps, 
while  actually  it  should  bridge  between  the  points  ( tn  ♦  0}. 
<ith  higher  damping  the  shape  becomes  worse  but  the  interpolated 
exponentials  are  nearer  to  the  true  curve  most  of  the  time.  *e 
3hall  find  a  formula  for  the  best  time  constant  of  the  system 
under  the  following  assumptions 

1.  That  the  "best"  time  constant  is  the  one  making  the 
actual  error  least  in  the  mean  square  sense. 

2.  That  we  may  take  as  the  true  curve,  so  far  as  our 
knowledge  goes,  the  linear  Interpolation  between 
the  points  tQ  +  0.    This  may  be  justified  by  the 
faot  that  the  device  cannot  in  any  way  perform 
higher  order  interpolation  -  the  curve  y(t)  is  con- 
vex upward  whenever  e(t)  inoreased  in  its  last  step 
over  the  final  value  of  y  from  the  preceding  step, 
and  this  is  quite  independent  of  the  curvature  of 

3.  That  the  system  is  In  a  "steady  state",  that  is, 
that  in  the  step  under  consideration  y(t)  ends  at 
the  aajaa  distance  below  e(t)  as  it  was  Just  before 
the  step. 

4.  riiat  the  steps  come  at  approximately  equal  inter- 
vals or  a  seconds. 

An  interval  under  these  conditions  is  shown  in 
Figure  5.    Here  we  assumed  that  the  hand  wheel  was  turned  to 
give  a  ratio  of  -2_  as  deflection  of  the  dial  just  after  to 
just  before  a  step. 

.v'e  have 


y  -  A  e 


ylo)  -  b  -  y(a) 
A  -  b  •  a  e" 


1  -  e 

b  a~mt 

7  " 



s  -  y  -  y(o)  +c 

-    1  -  <3"BA 

-  o  —  s—     +  c 



The  Integral  of  the  squared  error  per  second  is  then 

-2  1 

-  b 

i  -mt  . 
1  -  e_aa  a 


-  8  - 

k  u2  SJL-  in  *  i  e-^  ! 

1  -  e 

-  2 

1  -  e-D  L2 


k2  ♦ 

3  u^rs(1-  ,+-t^j 

+  k  - 

3  k  L 
1  -  ^ 

1  -  e~D) 

D  ) 

l-0-D  [2  (D  d£) 


&  ♦*  ♦    2   ♦  i  (2  ♦  4k)  *  D  ♦  3  +  5e'D 

13  }       2      ^      ll--D)2  20  (1  .  e-D) 

It  i3  evident  from  physical  considerations  that  the  minima  of 
this  expression  ooours  fop  a  fairly  large  D.    In  faot  the  error 
ourve  was  plotted  for  k  -  .5  (Figure  6)  and  the  alnUBaa  ia  seen 
to  be  at  about  7  or  8.    ,<ith  D  this  large  the  abOTe  expres- 
sion ia  very  nearly  equal  to 

-  7  - 

sinoe  e"D  is  very  small.    To  locate  the  minimum  we  have 

2*  -  jL  -  2D  (2  +  3k )  -  2  f ( 2  ♦  4k )  3  +  3]  .  Q 
D2      D3  4  D2 

16  -  8k)  D  -  16 



3  -  4k 

7or     k  -  •* 

D  -  8 

Since  the  m**Hw«»  is  so  flat  (Figure  6)  this  formula  is  cer- 
tainly close  enough.    However  a  second  approximation  may  he 

found  as  follows:    for  x  small  — - —  -  1  +  x.    Using  this  in 

1  -  x 

the  exaot  expression  to  eliminate  the  denominators  we  get  as  a 
second  approximation 


-  tl*k)  U+e"D)  -  J5  llWD)  -  ±  (l*e-3)  e"3 


-  a  - 

£5  -  0  «  -  8  ♦  (3- 4k)  D  +  [6D  (D*l)  *  2D3  lk-1)]  e~D+  6D  (D+l) 

Using  the  first  approximation  to  obtain  the  value s  involving 
exponentials,  a  better  value  may  be  obtained.    Jor  k  -  |  the 
second  approximation  ia  D  -  8.03.    The  first  and  second  approxi- 
mations are  plotted  in  Figure  7. 

tfith  k  -  -|  the  ourve  x<t)  is  plotted  for  an  interval 
with  the  "best"  D,  in  Figure  8.    It  will  be  noted  that  the 
ourve  is  highly  damped  in  comparison  to  the  time  between  read- 
ings.   The  HIE  error  is  then  equal  to 

It  is  interesting  to  oompare  this  with  the  HIE  errors  obtained 
under  other  conditions.    If  the  devise  is  not  used  at  all,  but 
a  direct  coupling  made  between  the  input  and  output,  the  HIE 
error  between  the  step  function  and  the  linear  interpolation 
between  points  tjj  +  0  is 

(I)2  .  1 
CS)  a 

t  2 

[0  -  (-  ^)     ]  dt 

I  m  1  m  .577 
b  "  y-sr  "  '  a 

so  that  the  RLE  error  has  been  reduced  to  40$  of  this  value. 

In  Figure  9,  the  output  of  the  smoothing  mechanism, 
x(t),  is  plotted  for  a  certain  forcing  function  e(t),  using 
the  "best"  value  of  m.    It  may  appear  that  the  output  1b  still 
far  from  3000th,  and  this  is  in  a  sense  true,  but  it  must  be 
remembered  that  the  variations  in  e(t)  are  here  greatly  ex- 
aggerated over  what  would  be  expected  in  practice. 

Finally  it  should  be  pointed  out  that  a  very  mater- 
ial improvement  in  operation  could  be  obtained  if  the  opera- 
tor were  trained  to  turn  the  handwneel  to  obtain  a  ratio  2 


nearer  to  zero  than        This,  however,  would  probably  be  im- 




<  f  » 

C  SM 

C08R  iCTl^O- 



F.*t  2. 

H  I  nmOM 




Claude  E.  Shannon 

June  26,  1941 

Some  Experimental  Results  on  the  Deflection  Mechanism 

In  a  previous  report,  "A  Study  of  the  Deflection  Mechanism  and  Some 
Results  on  Rate  Finders,"  a  mathematical  study  mis  made  of  a  new  type  of 
defleotion  mechanism.    The  present  paper  is  a  further  study  of  this  de- 
rice  and  a  report  on  same  experimental  results  obtained  on  the  M.I.T. 
differential  analyser. 

For  oonvenienoe  in  reference,  the  schematic  diagram  of  the  machine 
is  repeated  in  Fig.  1.    In  the  report  mentioned,  the  utility  of  the 
middle  part  of  the  device  -was  questioned.    This  arose  from  a  misunder- 
standing of  the  basic  assumptions  underlying  the  design  and  was  oleared 
up  in  a  conference  with  Dr.  Tappert.    The  writer's  analysis  was  under 
the  assumption  that  the  mechanism  was  designed  to  find  rates  for  linear 
forcing  functions  only  (i.e.,  that  higher  order  terms  were  small  by  com- 
parison) ,  and  the  analysis  is  still  valid  if  this  is  true.    However,  in 
practice,  it  appears  necessary  to  assume  higher  order  forcing  functions 
and  the  deflection  mechanism  is  designed  to  give  the  oorreot  steady  state 
rate  (exoept  for  the  non-linearity  of  the  sine  gear)  for  an  arbitrary 
quadratio  foroing  function.    Actually' the  middle  part  (often  referred  to 
hereafter  as  the  "x"  part)  of  the  devioe  is  certainly  well  worth  while, 
as  will  be  seen  from  some  of  our  experimental  curves. 

If  a  linear  mechanism  has  a  transfer  admittance  T(ja)  from  input 
e(t)  to  output  4(t)  then 

J"  Q(J«>)  -  T(»E(juj) 
where  E  and  Q  are  the  transforms  of  e  and  q.    It  is  easily  seen  from 
transform  theory  that  if  e(t)  »  at  ♦  b,  a  necessary  and  sufficient  condi- 
tion that  4(t)->a  a8  t-^>-  is  that 

ǥ>-ȣ  jo 

If  this  condition  is  satisfied  the  system  may  be  called  a  first  order 
rate  finder  —  after  the  transient  has  died  out,  the  output  is  the  deriva- 
tive of  the  input  whenever  latter  is  linear.    Similarly  if 


T(O)  -  0        Y'(O)  -  j  T(0)  -  0        k  -  2,  5,  ...  ,  n 

we  have  an  nth  order  rata  finder  —  in  the  steady  state  it  finds  the  rate 
of  an  nth  degree  polynomial  forcing  function.    In  the  deflection  mechanism 
we  have  a  second  order  rate  finder 


-       +  e^w3  +  CgW*  ♦  ... 
if  we  assume  /      ■     nearly  1.    A  oircuit  for  solving 

A  ♦  42 

i  -  sin"1  4 

under  the  same  approximation,  to  the  nth  order  is  shown  in  Fig.  2.  The 
admittance  here  is  approximately 

1  #  a1(»  ♦  a2(»2  ♦  ...  +  Vl(j<u)n+1  ^ 
the  values  of  the  constants  in  the  mechanism  are 

1  »  4.63  J"» 

y(»  x  S  **oa  r  *  J" 

1  ♦  4.63  5.73  (j-r  ♦  1.094  (»S 

_  (1  ♦  4.63  .1«Qj«rf 

In  the  previous  report  it  was  pointed  out  that  due  to  a  clutch  and 
stop  on  the  input  to  the  sine  gear  values  of  q"  -were  limited  to  two  hori- 
zontal lines  (see  Pig.  6  in  that  report).    There  is  also  a  olutoh  and 
stop  on  the  displacement  of  the  lower  integrator.    This  effectively  fur- 
ther limits  solutions  to  a  parallelogram  ai  shown  in  Pig.  3.  Actually 
the  limitation  is  fictitious  —  the  q  shaft  oan  turn  an  unlimited  amount, 
but  when  this  stop  is  in  effect  the  stability  point  moves  at  such  a  speed 
as  to  be  equivalent  to  q  and  \  moving  along  one  side  of  the  parallelogram. 
Thus  if  we  keep  the  stable  point  stationary  paths  of  representative  solu- 
tions will  be  as  indioated  in  Pig.  3. 

The  trial  solutions  taken  on  the  differential  analyser  may  be  classi- 
fied as  follows « 


I.    Solutions  taken  -with  the  mechanism  as  designed. 

A.  8imple  analytic  forcing  functions. 

1.  e(t)  -  a 

2.  e(t)  ■  at  t  b 

3.  e(t)  »  at   ♦  Vt  ♦  o 

4.  e(t)  -  at3  +  fct2  +  ot  ♦  d 

B.  Response  for  8  -typical  target  courses,  the  target  vector 
Telocity  constant. 

C.  The  response  to  some  error  functions  superposed  on  typical 

D.  An  attempt  to  get  backlash  oscillation. 

II.    Approximately  the  come  program  although  less  extensively  with  the 
middle  part  eliminated* 
III.    A  few  runs  with  typioal  courses  using  three  different  third  order 
rate  finders. 

The  constants  of  the  target  courses  used  nere  as  follows  (see  Fig.  4) i 
Course  I  S    -  150  yds/seo  »  507  mi/hr 


7  «  2,000  yds 
h^  -  1,000  yds 

$     m  0° 

Course  II        8    •  150  yds/seo 

2,000  yd. 
h^  -  500  yds 

*  "0 

Course  III       8    -  150  yds/seo 

V  -  4,000  yds 
ha  •  1,000  yds 

•  -  0 


Course  IT 

S    -  150 

V    -  2,000 

h    -  2,000 

0    -  0 

Course  Y 

Course  VI 

S    -  150 

V  -  4,000 


h    -  4,000 

9    -  -  14.96° 

V  -  4,000  -  40  t 

S„  -  150 

V  -  2,000 

h    -  1-000 


*    -  -  14.96° 

V  -  2,000  -  40  t 

Course  VII 

B    -  96.6 


V    -  3,000 

hn  -  1.000 
6  -  -  60° 
V  -  3,000  -  115  t 

Course  VIII  8-150 

V  -  4,000 
hm  -  500 
•  •  0 

The  distribution  of  these  courses  is  indicated  in  Fig.  5,  together 
with  the  approximate  maximum  range  of  the  3B  A. A,  gun  (21  sec.  fuse  setting). 

The  actual  input  to  the  deflection  meohanism  is 

r*  s  h  t 

a       o  p 

but  since  it  was  desired  to  compare  the  actual  output  with  the  true 

sin"1  i 

the  quantity  e  was  plotted  against  t  and  integrated  to  provide  the  input. 
To  calculate  I  the  following  method  was  found  to  be  the  simplest.    We  have 

8  h  t 

'  --P  **- 

o  p 

A  computation  schedule  was  set  up  based  on  this  formula,  working  baok- 

wards  from  the  time  of  burst  t  +  t    to  the  present  time 




t  ♦  t  h  V 

P  P  p 

"  h/l*£8g(t*tp)J2         -  yi-  (ftp)Sgtan  *] 

*p  t  /  78— IT 

from  -  I  -  TV 


The  ballistic  data  used  in  getting  t    (IV)  was  read  from  the  chart 

Fig,  24  Opposite  p.  59),  Coast  Artillery  Field  Manual,  FM  4-110.  The 
value  of  tp  was  merely  read  off  corresponding  to  the  computed  values  of 

r    and  h  . 

P  P 

If  we  assume  as  an  approximation  that  the  shell  velocity  is  oonstant, 
k  yds/seo  (i.e.,  that  the  equi-time  of  flight  curves  in  the  ohart  are 
circles)  so  that  with  V  constant 

,  2.2      .2  „2 
k  t    «  h    +  V 

P  P 

h    -  h    +  S  (t+t  )  ' 
p       m       gv  p' 

p  m 

h/h"  ♦  S  t2 

we  oan  eliminate  tp  and  hp  from  the  system  to  obtain  the  following  equation 

between  e    and  tt 


e2[k2(hm*Sgt)2(h^2)-  (h2*S^)V2S2] 

+  *[2  vsWhfVTt2]  -  C^5T2*TT2(h  *ts  )2]  -  o 

g  m  n    g    '      1    g  m     g  m*  m  g'J 

Evidently  the  same  curve  a  (t)  is  obtained  if  h    and  S    are  both  multi- 

o  m  g 

plied  by  the  same  constant. 

The  differential  analyeer  set-up  used  is  shown  in  Pig.  6.    An  attempt 
was  made  to  generate  the  sine  function  with  two  integrators  solving 

but  this  was  found  impractical  because  of  the  large  integrator  loading 
necessary,  and  an  input  table  was  used  instead.    Even  in  this  case  it  was 
necessary  to  use  a  very  large  scale  factor  on  the  independent  variable 
shaft  due  to  the  small  integrating  factors  (l/S2)  of  the  differential 
analyzer  as  nompared  to  the  ball  type  (about  1  under  comparable  condi- 
tions). ,This  resulted  in  solutions  which  represented,  actually,  30  sec- 
onds requiring  30  minutes  of  maohine  time. 

The  equations  of  the  deflection  mechanism  are 

9  i  *  .54  x  -  .54  | 

♦  4.700  q  ♦  1.692  q  -  1.692  e  ♦  4.700  x 

1 1-4 

It  was  neoessary  to  approximate  the  ooeffioients  with  available  gear 
ratios  on  the  differential  analyrer.    Fortunately  some  very  close  approxi- 
mations were  found.    The  equations  actually  set  on  the  machine  were 



*  ♦  .54 :X  -  .54  i 
♦  4.706  $  ♦  1.694  q  -  1.694  e  +  4.706  x 

The  error  is  of  the  sane  order  as  the  expected  machine  error. 

Except  for  runs  In  group  ID  the.  machine  was  made  as  "tight"  as  pos- 
sible, the  backlash  being  corrected  by  frontlash  units.    Due  to  the  large 
scale  factors  used  and  the  high  inherent  precision  of  the  integrators  used 
in  the  differential  analyeer,  the  rune  ray  be  expected  to  be  more  accurate 
than  the  actual  deflection  mechanism. 

Solutions  were  taken  in  the  form  of  both  curves  and  counter  readings. 
The  ourves  given  here  -were  reproduced  by  pantograph  to  ordinary  graph 
paper  size.    Curves  not  directly  drawn  by  the  machine  and  numerioal  values 
quoted  are  taken  from  the  counter  printings,  which  give  an  additional 
decimal  plaoe  not  readable  from  the  ourves. 

Discussion  of  Runs 

Host  of  the  curves  are  given  with  4  as  dependent  variable.    To  esti- 
mate the  error  in  yards  for  a  given  error  in  q  from  e,  the  ohart  of  Fig,  6A 
may  be  used.    This  is  computed  from  the  approximate  formula 

r  cos  t  IS 

.  r££L*  Aq  -  r  A(e,q)  Aq 

For  rough  comparisons  the  coefficient  A  may  be  taken  as  1,  the  error  then 
being  the  4  error  multiplied  by  the  predicted  range. 

The  first  set  of  runs  taken  were  with  a  sudden  impulse  e  -  kl  with 
the  system  at  rest,  both  with  and  without  the  middle  part  of  the  meohanism. 
Runs  were  taken  with 

k  -  0.1,  0.2,  0.4,  1.0,  2.0 

Typloal  curves  are  shown  in  Figs.  7  and  8.  The  results  are  very  close  to 
computed  ourves  on  the  assumption  that  l/f/l*^  ■  1  when  k  <  .4,  but  above 
this  the  non-linearity  becomes  appreciable.  In  the  worst  cases  the 
sient  disappeared  to  within  machine  errors  in  25  seconds,  and  for  most 
oases  within  8  to  12  seconds.    The  action  with  the  middle  part  out  was 


considerably  more  rapid  than  -with  it  in,  the  transient  being  6  tines  as 
great,  as  had  been  predicted,  this  being  a  special  case  of  a  linear 
forcing  function.    Pig.  9  is  a  -lot  of  the  time  required  for  the  transient 
in  4  to  reduce  to  2/10  of  its  maximum  value.    For  values  of  k  greater 
than  about  .35  the  curves  cross  the  axis  once  with  the  middle  part  in. 
The  curves  with  it  out  are  all" identical  with  k  >  2,  due  to  the  action 
of  the  slip  clutch  on  one  integrator. 


Next  a  series  of  runs  were  taken 

e  -  ktl(t) 

starting  from  rest,  with 

sin""T:  -  steady  state  S  -  15°,  30°,  45°,  60°,  75°,  60. G° 

the  last  being  the  limit  of  the  sine  gear,  the  maximum  possible  deflection. 
These  runs  are  shown  in  Figs.  10  and  11.    The  transient  died  out  in  all 
cases  within  20  seconds  except  with  x  in  for  S  >  75°  in  which  oases  30 
seoonds  or  more  was  required,  due  to  the  action  of  the  slip  clutch.  These 
long  transients,  however,  would  probably  not  be  troublesome  since  such 
large  deflections  would  only  ocour  in  practice  with  the  plane  almost  di- 
rectly overhead.    For  the  smaller  values  the  response  is  about  equally 
rapid  with  x  in  or  out. 

Quadratl o  Forolng  Functions 
— — — —  1 

The  runs  with  a  quadratic  forcing  function 

e  -  at2 

were  the  first  to  show  the  superiority  of  the  mechanism  with  x  in.  Runs 
were  taken  with 

a  -  .01,   .02,  .03,   .04,  .10 

With  a  quadratic  rate  finder  the  solution  q"  should  approach  2  at,  and  with 
x  in  this  was  very  nearly  true,  the  discrepancy  being  due  to  the  sine  gear. 
8ome  solutions  are  shown  in  Figs.  12,  13,  and  14.    The  errors  increase  with 
a  and  with  \.  The  maximum  slope  found  in  air/  of  the  I  courses  plotted  is 
about  equivalent  to  an  a  of  .05  so  that  the  large  errors  due  to  the  sine- 
gear  with  a  -  .10  need  not  cause  great  concern. 


Cubio  Forcing  Functl ong 

For  oubic  forcing  functions  the  following  were  used 

•±  -  -.04  t3  ♦  .1  t2 

e2  -  -.001  t3  ♦  ,05  t2 

e3  -  -.0002  t3  ♦  .02  t2 

.These  -were  chosen  as  having  second  order  tangenoy  at  t  -  0  so  that  the 
transient  is  small.    The  results  are  shown  in  Figs.  15  and  16.    The  re- 
sponse with  e2  and  especially  e3  are  very  olose  to  the  calculated  values 
on  assuming  the  equation  linear.    The  error  in  e^  is  somewhat  greater  as 
in  the  quadratic  case  with  higher  acceleration. 

Effect  of  Backlash 
— — —  —  ' 

A  number  of  runs  were  made  to  determine  the  effect  of  backlash  using 
several  different  foroing  functions.    In  order  to  inorease  the  amount  of 
backlash,  frontlash  units  were  inserted  at  several  oritioal  points  in  the 
baokwards  direction.    The  results  of  these  runs  were,  however,  oompletely 
negative,  for  no  oscillation  of  any  sort  was  discovered.    The  system  was 
given  "shocks"  by  sudden  turning  of  the  e  shaft  and  other  methods,  but  the 
solutions  were  oompletely  stable    The  only  results  were  small  consistent 
errors,  of  the  order  of  magnitude  of  the  backlash.    It  is  possible  that 
due  to  the  large  soale  factors  used  in  the  set  up,  even  the  artifiofelly 
introduced  baoklash  was  not  sufficient  to  oause  the  oseillatlon  effect. 

Response  for  Typical  Courses 

The  response  for  the  8  oourses  described  above  are  shown  in  Figs.  17 
to  24.    It  may  be  noted  that  even  on  the  flat  oourses  (e.g.,  IV)  the  opera- 
tion is  poor  without  x.    On  the  flat  oourses  the  response  is  satisfactory 
with  x,  the  error  being  less  than  20  yards  except  sometimes  at  the  hump  in 
e.    However  for  the  steeper  courses  errors  of  60  or  more  yards  are  common 
after  the  start  of  the  peak  which  do  not  disappear  until  nearly  the  end  of 
the  oourse.    The  action  is  particularly  bad  coming  down  the  hump.    Fig.  25 
is  a  plot  of  the  error  in  yards  with  oourse  VIII,  x  in. 


Response  to  Error  Functions 

In  Pigs.  26  -  28  are  shown  the  responses  to  some  random  error  func- 
tions of  various  kinds  superimposed  on  courses  I  and  II.    The  operation 
in  damping  out  the  error  is  considerably  better  with  x  out.    However  it 
seems  from  a  consideration  of  the  size  of  the  errors  introduced  and  the 
responses  found  that  the  system,  even  with  x  in,  damps  the  errors  more 
than  necessary.    That  is,  it  might  be  preferable  to  increase  the  speed  of 
response  so  as  to  reduce  the  transient  errors  in  the  solutions. 

Pigs.  29  and  30  show  the  responses  when  we  suddenly  start  tracking  a 
target  in  courses  I  or  II  with  the  machine  previously  at  rest,  with  the 
target  at  several  points  along  the  course. 

Tests  with  Different  Equations 

Three  runs  were  made  on  course  VIII,  the  most  difficult  one  of  the  : 
group,  using  three  different  cubic  rate  finding  equations.    The  equations 
used  were  (assuming  linearity)  critically  damped,  with  the  transfer 
admittance st 

[i  ♦  2(>)r 


(2)  4  .  1  *  4(j«fr  ♦  6(J.) 

[i  ♦  (J-)]4 

The  results  of  these  runs  are  shown  in  Pigs.  31,  32,  and  33  and 
should  be  compared  with  Pig.  24.    Of  oourse,  this  gain  is  accompanied  with  . 
a  loss  in  error  function  damping.    With  the^roots  equal  to  2  the  system 
had  a  slight  tendency  to  be  unstable  on  the  flat  part  of  the  oourse.  This 
however  appeared  to  be  due  to  the  "human  backlash"  in  the  operator  on  the 
sine  table  and  would  probably  not  be  present  with  a  sine  gear. 

It  is  easily  seen  that  an  increase  in  the  values  of  the  characteristic 
roots  of  the  equation  demands  a  proportional  increase  in  the  power  require- 
ments of  the  integrators.    It  may  be  that  this  will  be  a  design  limit  in 
the  case  of  meohanioal  systems.    Ho  difficulty  would  be  experienced  here 
however  with  electrical  integrators. 


The  main  conclusions  of  this  work  are  as  follows: 

1.  The  middle  part  of  the  machine  is  definitely  worth  while. 
Although  it  increases  response  for  accidental  following  errors,  the  gain 
in  behavior  for  actual  courses  more  than  offsets  this  disadvantage. 

2.  The  system  behaves  nearly  enough  like  the  linear  system 

1.094  "q  ♦  5.73  q  ♦  4.63  q  ♦  q  -  4.63  I  *  4.63  e 

to  within  a  few  per  cent, 
ction  of  37°,  the  approxi- 

that  this  may  be  used  to  calculate  its 
providing  q  <  .6.    As  this  corresponds  to  a 
mation  is  sufficient  for  most  eases. 

3.  For  targets  whose  elevation  at  their  nearest  point  is  greater  than 
about  50°  fairly  large  errors  occur  due  to  substantial  cubic  and  higher 
degree  terms  in  e.    This  indioates  that  it  might  be  worth  while  to  use  a 
higher  order  rate  finder.    Tests  made  with  a  oubio  rate  finder  showed 
greatly  improved  results. 

4.  If  the  additional  cost  of  another  integrator  and  adder  required 
for  cubic  rate  finding  iB  too  great  to  be  Justified  it  appears  that  the 
system  oould  be  improved  by  reduoing  the  time  constants,  for  if  sufficient 
power  is  available  from  the  integrators,  the  only  disadvantage  would  be 
increased  response  to  random  error  functions  and  our  results  indioate  that 
they  are  now  damped  out  more  than  neoessary. 

5.  There  is  some  indioation  that  better  results  would  be  obtained 
by  making  the  three  time  constants  equal,  or  more  nearly  equal  than  they 
are  now,  although  this  is  not  certain. 


mr— < mum,  tmmm  l-.-jgni — 

inS^^B^^ESS  — —  %5S55  immmm  tw 






^H^^^  igOiffililllfin  imlUlIl  iOtliiinflmiiiii  iioio|i|  Illy  gnl  gm^ 


•IZI !!*••&•»■« 

■IM ««••■■■••■ ••««■••••• •■•■■•••«* 

■apt  •«»••■■•■•  aMsavaaas 
mmmt  Imu Man Miii  mMini 

iaaf  »fj»8      ■  ■ 

—  -■■■■»« 


iftai  iMNMIitMin 


aa  uuiiiiii 












MiZa  55555  iitH  am  M"j 
■ESS  ScSS  Bwn  mvm  nvuvv 

toHBS  Sasui  :::::::=:  2K:r 






liiiHtan.!*'  ■  tmmmf  »«««»»««»»»««««  »«»»»  lllli  HIS  ■«»»  ■»■»«  ***       Sii  f?=T=-— 

i--^—  :rt~;::  

••■"■■•••■■•B.BBII.IIIIBB.II. ■■■■■■■■■■•I*  Jl   I •  •  ■  . 

:::::  ■■■  

!!!!ai1111  Iaaai  1  Hiaa>l  »■•!•■■■*  ■■■■■  "  hi         "!! ! 


.IB. ..III!  ai'BIBII  

BBBaiBIBBIII  ■■■■** 

■■■■■■■■■■■■■■■■■a it  ■■■■<-  - 
■*■■■■■■  wmw* 

•••••   urn  •  •••2222222 21222 222*. 2222! 22".. 

■  ■»"                     bbim Miiaiiaisami.  ■■■■■■■■■■  ■■■■■■■«■■ 

«•«■•••«•    riiniiiifMiiniMiii  *iimiim«(IimimimmSm!!!m 



■■■■■ ■■■■■■■■■■ ama ■  ■•■« ■ 

•  •■•-■■■>  awi aauiiMMiMaa  ibm.  .£2 ZZ  22222 

bbiibbbbm ibbbb  um imi mn  ■■■■■■■■■■  •■■«..■■■..«■..  .... ■ 

•••.■•••••••■•.•■.••.•••■.•••.a.... ■•••■•■••« ■....«■!■• !•»•••■!•* 

iiiiiiiniii  um  miiiiiiiimi  iMtiiiiiiifniiiiifMiiiiii 

 lUMluniMttilu  ••■iiiiiMtiiinnimni  ... 

■■■■■■■■■■iMMiiiuiimiHiiiuii  .......... 

■»■■■  iiniitMiiniiiwiiMmMHimmmiimimiiHiMmnnmiiin 
...ii  ■..  ■■■»..•■••  mi  inn  in 

Mill  ■■■■■  M1M  ■.M.^.-.W  W  _ ^lOTHMIUaa.  •••■■■»■•  BUM  ■■•■■■■•■■..«.. 

■•••■•••■1  III 

'■■■■■■■■■■■■■•.I  ... 

■*;■?«■■•■■■•»  --«■■■ 

■  .in......  bk  mmm 



•  ■■■■t  ■■■■*■■  BBBasiiiL-t 


'\:::::::::::::u:::::::::::i::::^::::::::::::::::::::::::::::::  :::::: 

..i**;;"  -•»••»•■■•••»■•■•••• •••••• 

■■  miiin 

222222222! !22*i~r***  "t-  ..»•».■•.•■••..■•.•• 
II2I! II22! ?*;■! f^£i.  ■■•■■■■■•■•■•■■••■■■ *«b  •* .  •  ■■• ■.*•..■•«■ 



■ .BBSS 

■■■■■ •■■■■■■■■a  SSSSSSi 

22222:22222222s  ■••■•■■•"•^^•'^■«  •■-■«■••■■■■■••••••■   

2222! 22222 2222! 12222 252"  Ik^-lkMIIUIIIIIIIIIIIIIiiiuUIIIIIIIIHIIIill 

2222! 222222222!  22222  !22±f  i-iisis^*"*  -«m..i»..... ....... 

!222!!!22!!222!2222!2222! £2222 222^*  --•*»*■ ■•■«■••■■. 

■  ■•«■•..  ...a... -is.  ii 

2222  2222222222  2222222222  !222!2M2!*^*: 

■■■■■u  inrni  ■■■■ 
bbm  bbbbb  ...  bbbbb .ihii  ami .. ...  .... .  .u 


HiiiiHiiiiiimii  iitttiiiiu  iiiiimitiia  ii  iiiiiii 

iMiiiiii •■■■•aaaaa aaiiaii 

■■■■  BIIBB  milMHI  1.M1  ■■■■■  IIIIB  milHII|«M|||| 

■yi  ■■■■■  itwi  mw  ■■■■■  ■■■■■■■■■■  ■■iiiiwi  ■  win 

■■■imii  iiihiiiiii»iiiiiii  ■■■■■•■•■i  ainaai- 
■Mf  IHII  ■■■  ■■■■■■■  Hniiiin  ■■■■■  mil  ■■■■■■■■■■  immiii 
■■■•■••••■■•■••■■••■•■■•■■MtlmiNMI  ■«•■..«.■. 

IZ1I22  222222222!  ••■■»•■•■■•■■■•■• 

222?  222SS2222!  22*2*  ^■■■■^■■■■■■•■■■■■■■•■■■■■■■■■••.>aB»aaaaiiiaaaa.a... 

222S  22222  22222  22222  ^2    *  .■»■.  n.  umii— 

■  BUM  I 


•  ••Si  aaaiaasaa.  miiitiii 

2222  2222222222 22222  25222  22222  SlSnSSSSS  IZZZZ  ZVZ*  ZZZmZZmZZZ 


lljMIIIIII  MUlllllllttMMI 

■■■■■■■■■a ■■■■■ aa 

8B858g— ■  ■■■■■mwWMHiii— ■■■■■■  ■■■iiiiaia 

N  flllMIIIIIIIMIMMIIIU  Mlllllllil  m 

miiim  2*J22  22 212  .2222  22222"  2 

!  222222  2222222222  22222222i!**"*"w* 

2222  22222  222222222222222  222222222!   • 


222!  ?'"*"aai"Vaa*"a***l*MW,m'l>'>l  iiiMHin— niiaii 

■  BvaaaaBiaa  aaaieaaaa 

■  aaama  mim  t .  mmu 

■  I     ••tlMIIIIIIIIIII  |  .III..: 

mm  ■■■■■■■■■■  tun 

■  WlMBil 


Hiiiuimi  iiNiiiaiiiiu  ■■««■■■■»■  ESaaaBaaaiiiaiiii 
iiiuh  mn  inii  Hiniiui  inmiHi  ■■■■■■■■■■■■■■■t 



■■in  inn  aaaai  mhjiiiiim  ..........  .. 

•  aaa  miiiiiHiiini. 

■  ■■■•■■■■■■■■I  IIIHU 


■■■a ■■■■■■■■■a  Bin 
■■■a  ihii  bbim  iibj 

■  ■a.  iiiiinmiimiaMB mmm\ 
 miiHin  ■■■ 

■•■Niiinmu  mniiMi 
 •«•■■  aia.iaaava  iiiibmim  a  .bib 

*■■*■•«■■■  ■■■■■•■»l  ■•■■IIMHiaHIIIIIII  Hilllllll 

■ibbb uaifl »■»•■  ■  a. umimiiiiimni  imiiiiiii 

■mm  aaan  mm  mmm  ■■■■■ 
bb  aaa  ■  aaa  ■  ai  IM  ima 

iMiiniiiiiHimiinniimmiiiiiiiiimmiiuiu  ■■■■iimhhimihii 

2222  222222222!  :22K22*"*2"""**"*""*"""*"**"^ 

222!  12222222m  22222  22212 221222222222222  2222221222  2222222S22  **;'****;;; 

hi  m 

I  Embbmm  (SiiiHiiiMHiiinj 





sh  s=s  nca  sr  ■•  rr:  xsa  ssisn  rrsrrs  rr:  b?  r brsrrs  •?•■?  ■•:■!  am  k:k  tnsRSSiaRSBif 


rrs  nss  rt^  r  r:  xsa  rrs »  rr  rrs  rrj  rr:  Br:  £r::::rr:::r:r:r::r::: 
rr:  ass  b^j  r=  rr:  rr:  rrrrr  r:r  :rr     R«.  •  • . « .«••  •••**  ■•«•  krcrrj 



RRIRRtB^  BTWRR3  BR3RR3<RRB:a  BR;nm  ^ 

















■fiuilliiiiiiuiflH«iiia»ift*fliMM*ai*w«  I 
■  ■iHiiiiiiiifimMiffumiiiuuMtMiM  I 

B"~"  !HBM,HM'!lMW*>M*"l"t>w"Mwi  : 
"!!l!''*!"*,,>"!!mui*>*,'!*MPia>  I 
■uiatMBMUiflisiMUiiMMiiBMaMiaai  I 

—  MM**! 



_  IUIIUI..  . 


:::::  :k:sj: 



■ »*■■«*■■ mm a  asm ■ 
■  Himwii  >Maa» 


~ MM WWW  



|i«ntifiiiii>r  .■•■■■■•■■■■■■■■■■••t  ■»•■•**»•• 

I  »yMMiiiiniini»iiu 
I  ■miuiuiiir|»VHU        mUHmuiai  aaaaM 

::::::::::::t»::w.::  :::::::E:::::u:Rn:::::nn:K:::us:i 
I  r  ::::::::::::::;::::!::::::;:::::::;  :uk: 

■  MIS  UU 

■  M»*«  ••■>•■ 


=7 — -^aVBltflt'M  §?  Aai — 


•  b  ha ■ *  » 



"IIIHI»lltlllU«IIHIIinMllinHlllMaHHMIIffllllll*»l>ill  KllKll 


■  •«>(■■-«•••. ■■■■«■•■>  «>■■■■•■■«•■••■>■«■■■■>«■  ■■■■         .........  .*•■..•«.»•.».  ...... 

lHi>li*i«iiUMiiniiiH«iiiiiiiiitiiiiiniiHii«iiiiiiii«  >it(Muii«i«»l.aii(i«if»iiiMiimiiii>' 

!■■■ ££!5!5ffff> SfSSSflSSfHSS!! ■•■55 ■•••■••••••••ft *■*•!*»»•••••« »••>»*•* 

■  •■■« 
« » « • « ■*  • 

■  ■*»»  »  ">*•••■■*«>■■■•  *ar  *■  ■>•..••■■ 

*  *   t«. *»»•••,■* 

..«*  -t  L.  *  IlltllXI 

MKftiiiitiiiiiriiiitii  .  »«  *;--•>•«•««. 

■  »•*  ■■■■■■»*•#««»  IIIIIIIi^MI:  •l]l..|t|l«'c||i|tt||M|t, 

■*»*■••■•**»»■••••§•*«■• * «•■><■»•.•*•*■■- 

iilliiiininfiiiiiiiiiiiiiiiii«iiiiiiiiiimHiiai(iifi«ii*n  i«iiii«HucMiiiiM«<iiiiMi«uiii> 

lUllllllllillllllll ■■■»■■■■■>•■■■■■•■•■ ■■■■•«■■•■•••»•■»>•' •«f ••••■••••■*« -AM. 3 Ittirtlllll.tUtfl  . 
IIUIIIIIIIIItlllllllllllilllMinil^KIUMUIKIIMIDfltMMUMMOt.l.^Mt.l.lllltlllWtllX-      s  < 

>«i>>  ■>■*>>>•  >•■>••»>..>•-■•■•*«•-•■•■•..••••••■■.■>•*•.  a^.x  ...,....««.«•.».  .».»  ...... ........  ... 



>*»■>■••..• i  «»«. * ««. 

I  ■■■•«■■■•■  ■«■••■■■■■■■■■■*»■■■»■■■■■■■«  «•■■■»■■■  IBS' 

 ■■«••■•■ •   

■  ■>■■■■»)■«■■«■  ■itiillilit  aa*Ba 


■  ■■■ 


_.     JBBBMBB  *■*■■■•*  «  a  « 

i  *•>«•■••«■■■•*  ■■Miiiari 

 ■*  MlilllllliflMMIt 

■  »*■  iffititiiiiaitimtiii  •■iiaidiintKiiiiiii  ■ 

 ■•■■•■■»•■  i»iiiiijfi>iiiiiiiiiiimm«iiitiiiitt<  I 

■•■■■■«••»  '  ■•■•■IIIIIIIIIiaillllUIIIIMIIllllllll  itMiMtim  (laiiMlffiliitKilikM)  I 

_  JniMIMimilHIIUItlflimiltlMRRIIt'iiakitlflltMll.fflllMIHMIimif 

•  mm  ■  «f  k  .  r  ........ .)  r>( 

•  •■laiMaiMii'i  j  -  v-j 

lltlNMIKKIMtiU  ' 

Criteria  for  CcnaUtecoy  and  uniquenee*  la  R«lay  circuit! 


September  ft,  1M1 

Zb  ft  ayatea  of  linear  algebraic  equation*,  thara 
ara  tfcree  poaaibla  type*  of  de«eu*rnoy,  n&aely  lneonaiateaey 
(no  poaaibla  aolntioa),  assblguity  (solution*  not  uniquely 
determined)  and  redundancy  (aura  equation*  than  neeeeaarr)  • 
Scoe**ary  and  auffioiont  condition*  ara  known  for  the  a* 
types  of  degeneracy  in  tcra*  of  the  rank*  of  mm  coefficient 
and  augmented  satrioea.    Soaewfcat  elailar  af facta  can  occur 
in  tna  boolean  equation*  characterising  relay  oircuita,  gir» 
ins  riaa  respectively  to  chattering  aaoiguity  of  relay  pool- 
tioa  for  certain  value  of  the  independent  variable a,  and  reduad- 

^UaVCJJ^   ^Je?^    HJ^avdsVj^JJ   ^^e^?   ^M9&aat'^^^aV^jtfca^^       3ha^fc    ^*1b^*J**^J  e^H^c*1^*^    Jpas\J?^fce^ca^^n>    ^H^^L^fc^Ht^^LJfc  ^cTiij^^^a, 

W«  aattM  i  aihmA   fjM»  thft«>  mnnA  I  tlrtna   Im  t— mm  f»f  a  a  ilMKltt 

ae^a?    ^s*es> ^*^acaa»>ea>*aaa^pa»    *>  wcT    Waaler  i*^i*»   ^p^peiwn  ek  vavatv    aa^ai    w^ses, a*  ^e^w^a  w 

dlacrlainant  7. 

Consider  a  relay  circuit  containing  •**  relay  a 
*X>  «gf  ••••         Hake  and  break  a  oat  cot  a  oa  ^  are  dealg- 
aated  aA  aad  *J,  and  we  auppoca  that  thara  are  a  independent 
variable  a1,  e^,  •»•,  e^,  which  do  not  depend  oa  the  relay 
poaitlona.    0uah  a  circuit  la  equivalent  to  the  circuit  of 
Fi*.  1  in  which 

*i  *B*  ****  ***    *i»  *#,•  ••• 

la  the  Boolean  function  which  la  aero  when  the  awitchee 

*»ft  MitMti  a^,  ere  la  eucfc  position*  that  the  volt- 

«M  wro»      la  the  original  circuit  la  *uf r icloot  to  oper- 
ete  It  ana  oh  otherwise.     The  fenetloa 



will  be  •till*  the  oirauit  ai«cri*ta*nt.   *e  alee  define  the 
following  it  mm*   a  eteadr  etate  la  a  relay  circuit  corres- 
ponding to  a  given  aat  of  veluee  of  the  laaepeaaoat  variables 

Ais  a  act  of  poaltloaa  P..  ?«.  JLrtao 
relaye  oath  that  If  tao  iadepeodeat  variabice  ere  given 
tao  valuee  A^,  end  tao  ralaye  held  la  tao  position 

Tt>  ««»•  Pa  lea*  enough  for  tao  eteadr  atato  fluxee  la  tao 
00U0  to  build  *»,  the  relays  will  remain  la  tao  aaao  poal- 
tloaa ladefinBtely, 

a  oeapletelr  •oolUatoay  oteto  at  a  relay  elreult 
la  a  aot  of  valaoa  Mg%  A,,  „#f      of  the  independent  variables, 
each  that  ao  natter  what  tao  Initial  yoaltloae  of  tao  relays, 
or  how  long  they  are  held  la  that  position,  ansa  they  ara  re- 
leesed  at  least  oao  aakeo  aa  laflalto  auaeer  of  eeoUlatloas, 
I.e.  ehattare.    Xa  addition  to  theee  obviously  exclusive  pocei- 
hUitles  a  alrealt  nay  be  •partially*  oscillatory  for  eertela 

Y*lu*i  of  th«  loft«j>emaoftt  rarioblos-  with  mm  iaitUl  oonCi 
tiooo  th«  •Ircuit  oh&tt«r*  and  with  otters  roiftpooo  ioto  o 
•toot?  ototo.  Ao  oxonpla  U  oho**  im  Figure  a  wtero  with 
too  ioltiol  OOO&MOO 

ax  •  0  (o9»i»to4) 

tho  oireuit  «h*ttero  while  with 

tho  oireuit  rei&peee  into  tte  eteefijr  ototo      •  1,  Rg  *  1 

fttSBBI  I  •  *°*  *i§  *••*         *£•       *M  t*  »e  o  otooA/ 
ototo  It  is  oeeeoeerjr  eoft  ouffloleot  toot 

This  lo  aeoeoeejy  eiooe  lo  o  otoo^jr  ototo  too  oeotooto  of 

■  ■ 

relay  «1o41o#i 


to  toot 

o-ai^ol^-t  «*eo      •  Wv  mt  •  A^ 

Xt  la  sufficient  sines 

so  tt*t  if  tii*  relays  are  hsld  is  these  positions  ?A  long 

enough  fear  fluxes  to  build  up  they  will  remain  there* 


Theorem  II  •  For        ....      to  be  completely  oscillatory 
it  is  necessary  end  sufficient  that 

t  C*^t         a^i  «^»  ••••  a^)  •  l 

identically  la  the         This  la  accessary  sines  other- 
wiss  there  Is  a  sst  of  a^,  say  9^  such  that  *  *  0  and 
this  Is  a  steady  stats  by  Theorsm  X,    It  la  sufficient 
alas*  If  true  thsa  with  any  starting  position  say 
9V  •»*,  Fa  at  least  one  tern  of  ths  sua  (1)  say  *t  •  n^ 
la  equal  to  one.  aa  that 

snd  one  or  ths  other  ana  to  •  hence,   After  sons  relay  has 
shangsa  «a  still  boys  ths  sans  aitaatloa  sines  f  -  1  so 
that  at  lsaat  one  relay  ashes  aa  infinite  number  af  shannon 
of  position* 

-  5  - 

la  »tM  f  Ui#         A^t         #♦♦»  a^)  is  *  function 
•f  tfat      (ait  idontioalir  ©at  or  n«ro)  too  oyste*  h»» 
•om  nt«aay  »tata«  aawoly  tat  roots  of  f  «  0,  Out  for 
arbitrary  starting  conditions  w*  saenot  toy  what  the  notion 
will  so,   Khataer  s  elroalt  eeefce  out  s  steady  state  or  sot 
depends  set  only  on  ths  artwork  topologr  so  la  Fig,  2»  oat 
•loo  oa  relay  ehareoteristise  as  la  Fig.  3.  Bare  If  lo 
olow  operating  ana  *j  wy  fast  the  « iron  it  oar  chatter 
with  both  relays  ialtieUy  uaeps rated  for  ag  nay  new 
stay  la  long  eaoasfe  to  opsrsto  K^.    If      lo  fast  and 
Sg  alow  release*  too  systea  rolapooo  lata  *x  *  0,  Rg  •  1. 
Boaoo  no  purely  slgsbrais  oo  editions  saa  So  sot  ap  to  deter- 
alao  whether  a  olroait  will  rolapao  lata  a  stood?  otota  whoa 
0  la  a  function  of  s^t 

©  ojk  ^fts^  eiSKe^sKJo^SPf 


SvlIj  15,  1943 

Gap?  Ko 




Professor  W,  Feller  of  ErovzD.  University  and 
Dp,  0»  E»  Shannon  of  the  Bell  Telephone  Laboratories 

AMP  REPORT  NO.  28.1 


This  is  a  report  on  Investigations  made  at  the  request 
of  Dp.  Warren  Weaver  (letter  of  December  28,  1942).    Our  study 
has  been  based  partly  on  oral  information  received  in  Aberdeen 
(January  18,  1942)  and  partly  on  the  material  contained  in  the 
Report  No.  319  of  the  Ballistic  Research  Laboratory  ("Report 
on  the  Differential  Analyzer  at  Aberdeen  Proving  Ground"  by 
Major  A.  A.  Bennett,  December  1942).    The  technical  set-up 
as  described  in  that  report  will  in  the  sequel  be  referred  to 
as  "present  set-up".    It  should  be  clearly  understood  that  we 
were  not  to  study  possible  technical  improvements  of  the  ana- 
lyzer as  such  nor  to  reexamine  the  theory  underlying  the  dif- 
ferential equations.    Accordingly,  the  present  report  is  con- 
cerned only  with  an  examination  of  the  procedure  of  mechanical 
integration  of  the  differential  equations  of  ballistics  as 
used  at  present.    Furthermore,  we  have  not  considered  any  methods 
of  integration  other  than  on  the  differential  analyzer. 

Before  proceeding  to  describe  devices  which  might 
contribute  to  the  efficiency  of  the  analyser  we  wish  to  summarize 
some  negative  findings,  as  these  may  render  superfluous  similar 
investigations  by  other  persons. 

a)    We  have  carefully  investigated  a  great  number  of 
alternative  set-ups,  on  the  differential  analyzer,  of  the  dif- 
ferential equations  either  in  their  present  form  or  using 
various  new  variables.    However,  we  have  been  unable  to  find 
any  form  superior  to  the  method  as  used  at  present  in  Aberdeen 

which,  in  our  opinion,  is  the  most  efficient  one. 

b)  We  have  studied  the  advisability  of  using  some 
method  of  successive  approximations.    Such  methods  naturally 
present  themselves  since  one  should  expect  them  to  reduce  the 
ranges  of  the  variables  involved  and  thus  increase  the  accuracy o 
However,  a  closer  study  will  show  that  it  is  almost  invariably 
necessary  to  subtract,  on  the  analyzer,  two  large  quantities 
which  are  themselves  independently  obtained  on  the  analyser. 
This,  of  course,  nullifies  the  desired  effect  of  reducing  the 
ranges.    Various  possibilities  have  been  studied  and,  among 
fchesn,  the  possibility  of  starting  with  the  vacuum  trajectories 
and  integrating  the  difference  between  them  and  the  actual 
trajectories.    Again  we  were  unable  to  find  a  method  which 
would  aopear  superior  to  the  present  set-up.    It  will  be  noted, 
however,  that  the  modification  of  the  latter  suggested  below, 
can  in  some  sense  be  interpreted  as  the  first  step  in  method 

of  successive  approximations. 

c)  Several  perturbation  methods  and  expansions 
according  to  various  parameters  have  been  tried  paying  special 
attention  to  methods  suggested  in  the  newest  Russian  literature . 
None  of  these  methods  seem  appropriate  for  the  analyzer « 

Coming  to  the  less  negative  part  of  this  report  we 
remark  that  an  adequate  theory  of  errors  of  the  differential 
analyzer  is  not  available  at  present.    However,  simple  theoretical 
considerations  based  on  experience  gathered  at  M.I.T.  make  it 
appear  that  a  very  considerable  part  of  the  total  error  is  due 


of  error  are  backlash  and,,  perhaps  even  bo?®,  inaccuracies  in 
the  following  meehenism  for-  the  input  and  vector  tables .  It 
ssems  therefore  possible  to  achieve  a  gain  in  accuracy  by  P®« 

dueing  the  range  o£'  the  variable??  in  the  integrators,  even 
though  this  nay  neeossitat©  the  introduction  of  new  adders 
and  gears.    $hs  following  r ecomsaendat ions  are  based  on  this 
assusaptiO'At    We  proceed*  step  by  step  starting  with  the  simplest 

Recomend&tions , 

1)    Consider,  to  begin  with,  the  horizontal  displace- 


sent  2:.    Obviously    dx/dt    will  range  from  its  maximum    r,  at 

the  beginning  to  seine  fraction  of  it,  say  qxQ,  at  the  end* 
Accordingly,  when  integrating  in  the  usual  form 

(1)  X     *  X  dt 

the  integrand  ranges  from    qzc    to    xQ  ,    Now  this  means  that 
only  a  fraction    1  "  -3 —     of  the  total  range  of  the  integrator 
disc  is  used  even  if  we  suppose  that  the  goale  factor  has  been 
chosen  in  the  best  way  (30  that  the  rim  of  the  integrator  disc 
is  used  for  values  of    x    near    x0).    If,  instead,  we 

14J_  i            f  *      1  .  <l 
(2)         x  -  — g r  xot    «  j(z  .  i-|   a^Jdt  , 

1  —  Q  " 

the  Integrand  will  range  from  its  maximum   — *o    t0  lta 


-  1  -  a  i 

2  o 

This  allows  one  to  use  a  scale  factor 

■s  r    times  as  large  as  in  the  set-up  (1)  and  to  utilize 

1  -  q 

the  entire  integrator  disc.    This,  of  course,  means  a  consider- 

able gain. 

Eow  the  constant 

i  ±  q 

in  the  integral  in  (2) 

appears  only  as  an  Initial  displacement.    It  is  therefore  seen 
that  the  realization  of  the  proposed  set-up  (2)  requires,  as 
compared  with  the  customary  set-up  (l),  an  additional  gear  (to 
produce    1  t  q  aLt  )  and  an  adder.    The  following  figure  shows 

the  simplest  mechanization. 




14-Q  . 

x  -      2  x0t 


It  goes  without  saying  that  the  gear  ratio  does  not  need  to 

be  exactly 

I. +.  .3  4 


xQ    •    any  number  near  the  middle  of  the  range 

of  the  integrand  will  do  the  same  services • 

If  used  to  its  fullest  extent,  the  system  as  described 
changes  a  previously  positive  variable  into  one  taking  on  also 
negative  values.    Although  only  one  change  of  sign  is  introduced 
this  will  introduce  some  new  backlash*     Now,  if  instead  of  (2) 
we  mechanize 


x  -  qx.t 

qxQ)  dt, 



the  new  integrand  does  not  change  sign,  and  no  new  backlash  is 

introduced.    On  the  other  hand,  the  optimum  scale  factor  for 

(3)  is  only        —    times  that  for  (l),  that  is  to  say  half  the 
1  -  q. 

scale  factor  for  (2).    We  conclude  that  with  proper  corrections 
for  backlash  the  set-up  (2)  should  prove  besto    However,  if 
enough  frontlash  units  are  not  available  at  Aberdeen,  the  set- 
up  (3)  may  be  tried  with  advantage. 

2)  A  similar  device  can  obviously  be  used  wherever 
the  range  of  the  integrand  does  not  utilize  the  integrator 
disc  to  its  fullest  extent*    This  is  true  for  almost  all 
integrators  whose  outputs  are: 

(i)    the  horizontal  displacement  x, 
(ii)    s    =     fv  dt  ,  v  being  the  speed, 
(iii)       Q"hj  ,  where    y    is  the  height* 

In  the  first  two  cases  the  new  set-up  would  not  produce  any 
additional  loading  since  the  integrators  are  driven  by  the 
independent  variable-motor.    In  other  cases  an  additional 
loading  would  ensue  which  may  have  to  be  compensated  by  the 
uae  of  a  larger  scale  factor  on  the  t-shaft;  this  would  in- 
directly slow  down  the  machine.    Whether  this  will  have  to  be 
done  is  impossible  to  predict  theoretically.    Should  it  prove 
necessary,  it  would  be  for  the  user  to  decide  whether  the  gain 
in  accuracy  is  worth  the  loss  in  speed. 

3)  If  the  above  described  device  should  prove  in- 

-  V/  -- 

*  -       v  j 

?'are  &i#£iuZ£  fit  cbs  atpens*  or 

f  ©Hewing  uspr-c-vftmca?  &*t 
oonaidaraMa  Eaaua]  #J>rk  end  io&s  Tn4  process  of 

integration  may  bis  Stopped  it  ecn^aivfsat  wnd  tlx* 

dure  4-5  cie:--  <jr  'be::  ?abr»vs!  fe«  <'*  TX'f' 

intervals?  C-ofttfSSeifi.  f'^r  wxrole  •.  «c?  5.afcet*iaa4!  febi  fs*« 
indicated  ite  the  figure  *'  rath  as  ex»  si 




Her'?,  even  the  usual  pros  a  dure  of  Integration  utilises  the 
entire  range  of  the  integrator  disc  and  no  gain  can  be  achieved 
by  Means  of  the  device  as  described  above ►    Ee^ever£,  the  integrand 
any  conveniently  be  treated  by  a  double  application  of  this 
device  splitting  the  interval  of  integration  into  two  parts » 
In  othsi  words,  insteed  of  e  given  function    fix)    we  integrate 
the  difi eranee  betveen    fix)  and  a  step-function.    The  output 
of  she  integrator  is  ~,o  longer    P'x)    *     j  bufc  th* 

difference  be  ere  en    »' x)  end  e  triangular  (or  "roof*-;  funesisn. 


r~ — V- 


i — s„: 


Similarly,  with  a  convenient  subdivision  we  may  use  any  step- 
function  for  the  integrand  and  the  corresponding  polygonal 
line  for  the  integral. 

This  procedure  obviously  requires  resetting  the 
integrator  in  question  and  changing  one  gear  ratio  each  time 
the  machine  is  stopped.    On  the  other  hand,  the  increase  of  the 
scale  factor  is  roughly  proportional  to  the  number  of  subintervals, 

4)    In  principle  this  procedure  may  be  looked  upon 
as  a  special  case  of  the  following  more  general  method.  Instead 

(4)  v(x)    =    Jj  dx 


(5)  w(x)  +  0U)    =      \(y  +  $*)  dx, 

where    0(x)  is  an  arbitrary  function  and    0Hx)  its  derivative. 
In  practice,  of  course,  0(x)  should  be  chosen  so  as  to  render 
the  maximum  of  Jy  +  0'\  as  small  as  possible  in  order  to  in- 
crease the  scale  factor  on  the  integrator.    Now  if    0(x)  is 
not  a  linear  function,  the  mechanization  of  (5)  would  require 
two  new  input  tables  or  their  equivalent.    However,  the  possi- 
bility of  obtaining  some  special  0(x)  by  means  of  non-circular 
gears  should  not  be  overlooked.    This  would  mean  a  considerable 



improvement  of  the  linear  method. 

5)  We  have  been  asked  by  Dp.  Dederick  to  consider 
whether  it  would  be  advantageous  to  generate  from  an 
input  table  (instead  of  by  integration,  as  at  present).  The 

foregoing  remarks  contain  an  answer  to  this  question.    It  is 
not  difficult  to  s  ee  that  the  present  method  of  obtaining  the 
function  by  integration  is  more  efficients    It  would  probably 
become  even  more  so  if  the  recommendation  2)  were  put  into 

6)  Although  it  is  in  no  direct  connection  with  the 
subject  of  this  report,  we  enclose  an  Appendix  describing  a 
simplified  method  for  computing  gear  ratios.    This  method  is 
based  on  previous  experience  (of  one  of  us)  at  M.I.T.  and  may 
prove  useful  in  connection  v/ith  ballistic  work  on  the  Aberdeen 
Analyser . 

Brown  University,  Providence,  R.I. 


Bell  Telephone  Laboratories,  N.Y. 
May  27,  1943. 

W.  Feller 
C.E.  Shannon 





In  this  appendix  a  simplified  method  of  determining 
gear  ratios  for  an  analyzer  set  up  will  be  described  which 
was  used  for  some  time  on  the  K.I.T.  analyzer  and  proved  in 
general  to  be  considerably  faster  and  easier  to  change  than 
the  original  method  of  equalities  and  inequalities.  The 
method  may  be  briefly  outlined  as  follows: 

1.  Draw  the  set  up  with  an  unknown  gear  ratio  in 
each  shaft  of  limited  displacement.    An  unspecified 
ratio  is  also  placed  in  the  two  inputs  of  each  adder. 

2.  Calculate  an  approximate  scale  factor  on  the 
independent  variable  to  give  the  expected  time  of 
solution  at  the  average  rate  at  which  it  turns. 
Choose  an  exact  scale  factor  near  this  approximate 
one  which  is  a  "round  figure"  in  terms  of  obtain- 
able gear  ratios  -  i,e.,  factorable  into  a  small 
number  of  simple  rationale. 

3.  Choose  in  the  same  way  scale  factors  for  all 
shafts  of  limited  displacement  -  integrator  inputs 
and  function  table  inputs,  and  outputs  -  so  as  not 
to  exceed  their  limits  with  expected  displacements. 

4.  This  fixes p  by  division,  and  from  the  integrating 
factor  of  the  integrators,  the  scale  factors  and 
gear  ratios  of  all  shafts  except  those  containing 
adders.    In  the  case  of  adders  the  input  shaft  with 
smallest  scale  factor  fixes  the  scale  factor  of  the 
adder,  the  other  input  being  geared  down  to  the  same 
scale  factor.    The  output  gear  in  the  adder  is  then 

5.  The  set  up  is  then  inspected  to  see  that  no 
integrators  or  other  parts  are  too  heavily  loadedo 
If  they  are,  reduction  gears  are  transferred  from 
inputs  to  outputs  to  reduce  loads  when  possible, 
otherwise  the  soale  factor  on  the  independent 
variable  is  increased. 

In  case  the  ratios  come  out  too  complicated  dif- 
ferent scale  factors  are  chosen  in  Step  3.    With  a  little 
practice  and  foresight,  however,  it  is  possible  to  obtain 
suitable  ratios  on  the  first  trial. 



Two  Hew  Circuits  for  Alternate  Pulse  Counting 

The  well  known  W-Z  relay  circuit  is  shown  in 
Fig.  1.    A  is  a  pulsing  contact  which  is  alternately  opened 
and  closed.    Indicating  closure  of  contacts  by  0  and  open- 
ness toy  1  and  for  relays  0  for  operated  (up)  and  1  for 
unoperated  (down)  the  circuit  goes  through  the  following 
periodic  cycle  of  operation: 














•  1 





Thus  one  complete  cycle  requires  two  complete  pulses  on  A. 

This  note  describes  two  apparently  new  circuits 
which  perform  the  same  function.    These  are  shown  in  Fig.  2 
and  Fig.  3.    The  operating  cycles  for  these  are: 
Fig.  2  Fig.  3 































These  three  circuits  may  be  compared  with  regard 
to  the  number  of  elements  required  as  follows: 

Belays  Contacts  Resistances 

Figure  12  1  continuity,  1  transfer  2 

Figure  2  2  2  continuity,  1  break  1 

Figure  3  2  2  transfer,  1  make  1 

In  Fig.  3  the  resistance  is  theoretically  superfluous; 

if  the  transfer  elements  could  be  trusted  never  to  be  shorted 
it  could  be  omitted,  but  in  practice  would  be  necessary  to 
avoid  shorts  when  the  relays  were  being  adjusted.    Figs.  2  and 
3  are  essentially  duals,  and  3  was  obtained  from  2  by  the 
duality  theorem. 

In  Fig.  2  it  may  be  noted  that  the  two  relays  are 
*ip-when  A  is  closed,  while  in  the  standard  circuit  they  are  both 
^jTwhen  A  is  open.    This  might  be  desirable  in  some  applications. 
Fig.  3  has  the  possible  disadvantage  that  both  ends  of  the 
pulsing  contact  A  are  connected  into  the  circuit,  while  in  1 
and  2  one  end  can  be  grounded. 

C.  £.  SHANNON 


.  1,  2,  3 


CONT.  6W' 

— O    G  «  









-o  o 


—O  O— 






-0  3 



TRANS.  Z      TRANS.  — ty\A/ — "    FIG.  3 

-o  o 


-o  o — * 

-o  o 

FIG.  1 

FIG.  2 



mm  within  uriimilti.  int..  ifTrnr 

Counting  Vp  or  ixmn  vith  -ulse  counters  w  J 1 

iith  binary  counter*  of  either  relay  or  *l»c5rsnic 
type  i*  is  ;o£sit2«  by  simple  KKsdif  icutisn  u>  count  bo  ih  up  end 
doon.    £uppose  Us*  largest  uuaber  that  oaa  be  j  w^isterec  is  L* 
refining  the  ao^lisent  of  «aiy  »unh»r  *     &  fey  t-a  *  «'  *e  sots 
that  subtracting  *  nutther  »  rrsJi  S  is  s^ulvileai  ta  adSin*     w  its 
eoapllsjsnt  ftt«i  •  Mf*He  •  thus  If  in  6  binary  oouatsr 

**  t&tis  the  soapllosat  o/  «  reading  ^hioa  s»&as  locking  up  Uis 
;*ul*y    urieft  ttrt  dSKja  and  #4ee-vei      lu  the  oa^,  aid 

putting  out  the  tubas  vfcioU  fire  ot&guetiag  unfi  vie  iu  Ute 

electronic  auoe)  and  then  let  the  counts*  eo&tlnue  add  tits  dumber 
of  pulses  in  rjuertion,  and  finally  t^ice  the  aa^lifitaat,  &^uin,  we 
a&ve  au&trseted  the  nuabsr.    ^etually  hm**v»r,  this  -raoees  onn 
be  done  si&ply  by  trcuef orric^  the  carryover  le&as  t»  the  opposite 
digit  ( tube  or  rtl«y).    ic  the  reity  esse  this  sjoouats  t*»  a  transfer 
Qcm toot  *e«*c*n  each  adjnsent  pair  of  digit*,  a&e  an  additional 
safes  oostoot*    in  the  eleutrouio  oaft*  the  carryover  lease  go  froa 
the  "  tAtar  tube  plut*  to  triiis  on  the  next  sts^a.   Here  *e  eoul4 
insert  «n  alcetroale  transfer  oontaat,  *»  s^wt,  for  exsnplo  in 
Figure  1.    jthen  *c  wish  to  add,  the  ©©asson  eon troi  leads  far  "edd 
is  given  sutoff  voltage,  the  -subtract"  lead  a  large  negative  vol- 
tage.   A  positive  lapulee  on  the  "one0  plate  of  a  state  then  cause* 
one  side  of  the  double  triade  to  c endue t  giving  %  negative  impulse 
to  the  next  g7id»  far  a  enTryvwr •    f  er  subtrfcctioo  the  voltages 
on  the  soatrol  leads  ars  revexfcod  atid  carryover  ooours  when  the 
"aero"  plate  volte,  •  inore&ses  i.e.,  when  this  tube  goes  out* 

0«  £.  &£*KjfCX 

C-»f  A  (9-4*) 

Cover  Sheet  for  Technical  Memoranda 
Research  Department 

subject:    clrcuitg  for  a  PiC>M>  Transmitter  and  Receiver  - 
Case  20878 


"    S.A.S.,H.W.B.,  H.F. 

2    --  CASE  FILES 

*  G.W.Gilman 
5  -H.W.Bode 
s    A. G. Jensen 
->  W.M.Goodall 

8  E.Peterson 

9  H.SoBlack 

10  -W.F.Simpson  -  Patent  Dept. 

11-  J.  H.Pierce 

12-  R.L.Dietzold 

13-  £.B  Zeldman  t$55$£^L 

14-  W.T.Wintringham 

15-  F.B.Llewellyn 

16-  C.H.Elmendorf 

17-  B.  M.Oliver 

1 8-  C.E.  Shannon 


DATE   June  1,  1944 
author  s  c.E.Shannon  and 


Circuits  are  described  for  a  P. CM.  transmitter 
and  receiver.    The  transmitter  operates  on  the  principle 
of  counting  in  the  binary  system  the  number  of  quanta 
of  charge  required  to  nullify  the  sampled,  voltage. 


MISSION  OR  TKt  RTVELATION  or  I   ,  C^rt^ 

Ciroults  for  a  P. CM.  Transmitter  and  Receiver  -  Case  20878 

June  1,  1944 


The  circuits  shown  in  the  present  memorandum  are 
intended  to  fill  the  boxes  of  the  block  functional  designs 
for  a  PCM  transmitter  and  receiver  shown  in  Fig.  6  of  a  December 
1943  lueworandum  (MM-43 -110-43) .         The  transmitter  functional 
diagram  is  shown  here  as  Fig.  1  and  the  general  operation 
is  as  follows.    The  incoming  signal  is  sampled  periodically 
by  closing  the  electronic  switch  1  with  periodic  impulses 
from  the  timer.    This  charges  condenser  C  to  the  sampled 
voltage  and  the  electronic  switch  opens  after  each  impulse 
isolating  the  condenser  from  the  signal.    The  existence  of 
a  voltage  across  the  condenser  causes  the  comparator  to  olose 
electronic  switch  2  which  allows  pulses  of  charge  to  feed 
into  the  condenser  from  the  pulse  generator,  discharging  the 
condenser.    The  number  of  these  pulses  is  counted  in  the 
binary  system  by  the  binary  counter  and  when  the  condenser 
is  reduced  to  a  reference  voltage,  the  comparator  opens  elec- 
tronic switch  2.    Near  the  end  of  the  sampling  period  the 
binary  counter  is  connected  to  the  distributer  which  registers 
the  binary  number  counted,  and  the  counter  is  then  reset  to 
zero;  both  of  these  operations  controlled  by  impulses  from  the 
timer.    The  distributer  then  sends  a  series  of  pulses  or  not 
down  the  output  line  according  as  the  binary  digits  are 
1  or  0.    These  digits  are  sent  in  reverse  order,  the  least 
important  being  sent  first,  to  tie  in  with  the  contemplated 
receiver  circuit. 

The  specific  circuits  are  shown  in  Figs.  2  to  8,  and 
detailed  descriptions  of  their  operation  follow. 

Fig.  2  shows  the  electronic  switch  1  which  charges  the 
condenser  C  to  the  signal  voltage  at  the  sampling  times.  The 
signal  wave  is  biased  up  so  that  its  minimum  value  is  slightly 
positive,  and  impressed  on  terminal  1  as  a  voltage;  i.e,  the 
signal  source  as  seen  from  terminal  1  is  assumed  to  be  of  low 
impedance.    The  timer,  at  the  sampling  time  puts  a  positive 
pulse  on  terminal  2,  which  is  inverted  by  the  triode  to  give 
a  negative  pulse  on  the  pentode  control  grid.    This  causes  the 
pentode  which  was  previously  conducting  to  cut  off.  Before 
the  pulse  condenser  C  had  a  small  minimum  positive  charge 
and  neither  diode  was  conducting  since  the  plates  were  held 
at  a  low  positive  potential  by  the  pentode  current.    As  the 

ING OF  THE  ESPIONAGE  ACT.  SO  U.  S.  C.  Jl  AND  12.     ITS  TRANS- 

pentode  cuts  off,  the  diode  plates  swing  positive  and  the  right 
hand  diode  starts  to  conduct  charging  the  condenser.    As  this 
condenser  voltage  builds  up  exponentially  the  voltage  on  the 
diode  plates  also  increases  positively  until  it  reaohes  the 
signal  voltage  and  at  that  instant  the  left  hand  diode  starts 
to  oonduct.    The  voltage  stops  rising  at  this  point  since  the 
plates  are  now  essentially  short  circuited  to  the  low  impedance 
signal  source.    This  all  occurs  during  the  timing  pulse,  and 
at  the  end  of  this  pulse  the  pentode  again  starts  oonduoting 
dropping  the  diode  plates  to  a  small  positive  voltage,  less 
than  the  minimum  signal  voltage,  and  isolating  the  condenser* 

Fig.  3  shows  a  standard  multi-vibrator  circuit  for 
giving  a  series  of  square  pulses.    The  coil  condenser  cross 
connection  of  plates  to  grids  causes  the  grid  transient  to 
be  a  cosine  curve  which  crosses  the  cut  off  grid  voltage  at 
a  time  determined  essentially  by  the  LC  product  and  independent 
of  amplitude  changes  due  to  variations  in  plate  supply,  etc. 
As  this  point  determines  the  period  of  oscillation,  the 
oscillator  has  good  frequency  stability.    The  output  appears 
on  terminal  6  as  a  square  wave. 

Fig.  4  is  the  comparator,  which  is  actually  only  a 
differential  amplifier  with  sufficient  gain  so  that  the 
granularity  voltage  applied  to  the  input  is  capable  of 
driving  the  amplifier  from  saturation  in  one  direction  to 
saturation  in  the  other.    The  input  is  the  voltage  on  condenser 
C  which  immediately  after  a  sampling  instant,  will  be  at  the 
sampled  signal  voltage.    This  voltage  starts  decreasing  by 
steps  as  the  condenser  is  discharged  and  when  the  condenser 
voltage  applied  to  terminal  3  moves  down  the  step  which  crosses 
the  differential  amplifier  threshold,  the  amplifier  swings  from 
saturation  with  output  terminal  5  at  nearly  zero  voltage  to 
a  high  negative  voltage. 

The  electronic  switch  2  is  shown  in  Fig.  5.  This 
circuit  sends  units  of  charge  into  the  condenser  through 
terminal  3  under  the  control  of  the  comparator  output  coming 
in  on  terminal  5.    The  multi-vibrator  output  is  connected  to 
terminal  6  and  the  output  of  the  multi-grid  tube  will  be  a 
square  wave  when  5  is  positive,  which  ceases  when  the 
comparator  swings  to  the  other  saturation  point  driving  the 
voltage  on  5  in  the  negative  direction.    The  double  diode 
connection  gives  a  pump  action.    When  the  plate  voltage  of 
the  multi-grid  tube  increases  to  the  upper  part  of  the  square 
wave,  the  charge  flows  into  the  condenser  from  terminal  4 
through  the  left  diode.    During  the  lower  part  of  this  wave 

-  3  - 

the  oondenser  discharges  through  the  right  diode  out  into  the 
condenser  C,  via  terminal  3.    As  this  causes  the  potential  of 
3  to  decrease  gradually  down  a  step  function,  it  is  necessary 
for  the  input  voltage  at  4  to  decrease  similarly;  otherwise 
the  difference  in  voltage  between  3  and  4  would  cause  the  size 
of  quanta  to  decrease  gradually.    This  lowering  of  the  voltage 
on  4  is  accomplished  by  a  cathode  follower  arrangement  on  the 
first  cathodes  in  the  comparator,  which  follow  the  step  voltage 

The  binary  counter  is  shown  in  Fig.  6.    The  descending 
step  voltage  which  appears  on  condenser  C  is  applied  to  the 
input  of  this  circuit  through  terminal  3.    The  input  resistance 
condenser  combination  serves  as  a  differentiating  circuit  (the 
time  constant  fairly  small  compared  to  the  time  between  steps) 
so  that  the  voltage  applied  to  the  first  grid  of  the  double 
triode  consists  of  a  series  of  negative  spikes.    The  double 
triode  is  simply  a  two  stage  resistance  coupled  amplifier,  and 
its  output  feeds  the  binary  counter  digit  tubes.    This  circuit 
is  of  standard  type  with  two  pentodes  in  each  stage  and  there 
are  two  stable  points  for  each  stage,  one  with  the  upper  tube 
cut  off  and  the  lower  tube  conducting,  and  the  other,  the  con- 
verse situation.    A  negative  impulse  from  a  preceding  stage 
applied  through  the  coupling  condensers  changes  the  state  from 
the  previous  stable  condition  to  the  opposite  one.    This  impulse 
is  applied  symmetrically  to  both  suppressors,  but  the  condenser 
across  the  cathode  resistances,  charged  in  one  direction  from 
the  previous  state,  biases  the  choice  of  the  next  state  toward 
the  opposite  one.    The  control  grids  of  the  "zero"  tubes  (the 
upper  row  which  are  conducting  when  the  corresponding  binary 
digits  are  zero)  are  connected  to  a  common  control  lead  which 
is  used  to  reset  the  reading  to  zero  after  the  reading  is  reg- 
istered by  the  distributor.    This  is  accomplished  by  a  neg- 
ative impulse  from  the  timer.    The  outputs  to  the  distributer 
are  taken  off  the  plates  of  the  "unit"  tubes. 

The  distributer  is  shown  in  Pig.  7.    After  the 
number  of  quanta  of  charge  has  been  counted  in  the  binary 
counter,  the  leads  11,  12,  13,  14,  15  will  have  either  low 
positive  voltages  or  B+,  according  as  the  corresponding  digit 
is  one  or  zero.    The  grids  of  the  left  triode,  will  then  be 
either  negative  or  positive  from  the  potentiometer  action 
to  the  negative  voltage  C-.    To  register  the  counter  reading, 
a  positive  pulse  from  the  timer  is  applied  to  the  control 
grid  of  the  common  pentode  allowing  it  to  conduct  and  pulling 
the  cathode  of  the  left  triode  and  the  diode  in  all  stages 
negatively.    If  a  digit  is  zero,  the  potential  of  the  cathodes 
in  that  stage  stops  at  a  positive  value  due  to  current  through 
the  triode  and  the  diode  does  not  conduct.    If  the  digit  is 
one  the  cathodes  are  pulled  negative  and  the  corresponding 

oondenser  C0  ia  discharged  through  the  diode  and  pentode. 
At  the  end  of  the  registering  pulse,  the  cathodes  go  positive 
again,  isolating  each  C0,  with  the  digit  registered  as 
presence  or  absence  of  charge.    The  reading  is  taken  off  the 
(/—        series  of  condensers  CQ  in  sequence  by  positive  pulses  from 
the  timer  on  leads  21,  22,  23,  24,  25.    These  pulses  allow 
the  right  hand  triodes  to  conduct  and  each  Cq  in  turn  to 

oharge  through  the  output  lead,  leaving  them  in  the  normal 
state  (at  a  voltage  about  equal  to  the  pulse  voltage).  If 
the  digit  is  "zero"  no  oharge  of  CQ  from  the  output  lead 

occurs.    Thus  negative  pulses  appear  on  the  output  when  and 
only  when  the  registered  digits  are  one. 

The  timer  system  is  shown  in  Fig.  8.    An  oscillator 
which  may  be  synchronized  subharmonically  with  the  pulse 
generating  multi-vibrator,  operates  at  the  sampling  frequency. 
This  passes  through  the  clipper  amplifier  to  give  a  square 
wave,  which  is  differentiated  to  give  alternating  positive 
and  negative  spikes.    A  second  clipper  amplifier  eliminates 
the  negative  spikes  and  makes  the  positive  ones  rectangular. 
These  short  rectangular  pulses  are  fed  into  a  delay  line 
terminated  in  its  characteristic  impedance.    The  timing  pulses 
needed  for  the  various  circuit  functions  are  tapped  off  at 
the  appropriate  places  as  indicated.    A  synchronizing  pulse 
may  also  be  taken  off  the  same  delay  line. 

Fig.  9  shows  the  receiver  circuit.    The  signal 
passes  through  the  clipping  amplifier  which  is  adjusted  to  give 
a  saturation  voltage  on  the  output  if  a  pulse  is  present  and 
none  if  absent.    This  output  is  applied  to  the  grid  of  a 
multigrid  pentode,  whose  other  control  grid  is  given  positive 
gating  pulses  at  the  center  of  the  digit  intervals.  These 
gating  pulses  allow  the  pentode  to  conduct  if  a  pulse  is  present 
and  the  plate  current  is  then  independent  of  the  plate  voltage 
(providing  this  stays  within  certain  limits)  so  that  if  a 
pulse  is  present,  a  fixed  amount  of  charge  (equal  to  the 
length  of  the  gate  times  the  pentode  current)  flows  onto  the 
condenser.    The  time  constant  of  the  R  C  system  (including  the 
pentode  load  resistance)  is  adjusted  to  allow  the  voltage  to 
restore  itself  halfway  toward  the  equilibrium  value  in  the 
time  from  one  digit  to  the  next,  so  that  after  all  pulses 
have  been  oollected  on  the  condenser,  the  charge  contributions 
of  the  first,  second,  third  etc.  have  decayed  by  factors  of 

2^'         i2"'       1#    At  this  tlme  a  positive  gating  pulse  is  put 

(r       on  the  grid  of  the  second  pentode,  allowing  the  condenser  to 
discharge  rapidly  into  the  low  pass  filter.    The  timer  system 
can  be  realized  with  the  systems  shown  in  either  Fig.  10  or 
Fig.  11. 

C.  2.  SHANNON 


Figs.  1  to  11 


.-.   \  Si 


F/G  -J 

!  • 


IuIjw  sn*pe  to  fclnlaine  Bend  sidtn  fcitn  Munprerlar^iD*  7-uloea 

*e  ooaslder  tbe  problem  of  » taping  pule**  #{t)  enlen 

ere  aero  outside  -fc,  U  in  ouen  *  wey  an  to  nlalml*»  tbe  UtmA 
nldtn  of  tbe  power  opeetrua  of  t&e  ennenble  of  funotioas  fors»4 
by  aeadiiis  s  eeq*eaee  of  tne  fuaetlean  *{t)  end  0,  witb  epeeia* 
or  £it  tne  probabilitiee  of  eltber  b*i»£  1/2. 

suoh  eneesiblee  of  fun  art  iocs. 

Theorem:    i*t  an  ensemble  of  function*  bo  defined  by 

n«  -~ 

enere  tbe  o^  ere  enoeen  iadopaaciintly  end  ore  equally  likely  to 
bo  one  or  s«ro.    toe  power  epwetro*  of  f{t)  ti*tn  eomnleto  of 
two  parte,  e  point  epeetrom  eonsl*tia&  of  too  epeetrw*  of 
%X  *  (t*ftam),  i.e.  tne  spectrum  of  o(t)  repented,  end  o  eontin- 
uvmm  pert  eoneintln*  of  tne  ottor^y  opoetrm  of     ♦(*) « 

f  irst  «  theorem  will  bo  prored  on  tne  epestrtsa  of 

Consider  too  estooorreletlom  of  f(t) 

4{ki  -  U»  |f  J  *f <*>  f(t»k)  dt 

Y^OO  _-r 
•  U»  A  /*£       e{t***n)  £  n*  o(t**»m»>>}  dt 

I**  integrand  oen  bo  written 

^a  %  a*  a(t*a*a)  »{t**««00 
*  j}   •*  a(t*t*a**J 

4     •£  fit-in)  oftt* a«»*vJ 

>Uaa  «•  eraraga ,  Hit  aua  of  tfca  first  two  parta  givaa  Urn  suto- 
correlation  of  ti*  f aaatiaa  J  £  a*  aiaaa  tka  ooaffiaiaata 

a*  aa  (a^a)  feara  saa  oaanea  ia  four  of  aalag  toots  a$aal  to  eaa, 
aaa  ia  tat  aaaoaa  t«r*  *jS   aaa  taa  aaa*  ataa  vaiaa. 

Ttoo  iaat  tana  la  taa  liait  reausao  to 
fit)  f|I  V)  at 

•  a 

by  *?  aoapaaaatoa  for  taa  attoaar  of  taras. 
Taaao  two  parts  (in  taa  saaarata  aaa  aaatiaaoaa  porta 
of  taa  apaetnaa,  taa  first  tolas  taa  aataoorrslatioa  af  a(t) 
raaaataa  aaa  taa  aaaoaa  tivlog  taa  saargy  apoatram  af  a(t) 

la  oaao  »(t)  •  0  oatalao  -u,  £,  taa  aaaarata  part  aaa 
poaor  at  o  -  ft,  1,  t ,  S,  ....  aaoeatia*  to 

f (t)  -  ^  ♦  r  am  aaa  at  ♦  I.  »a  aia  at. 

Sap^oM  w  *i*0  to  Ofaopo  o{t)  ljrla«  »iti»io  -L,  I  is 
•at*  •  »oj  os  to  alolalso  to*  bood  oprood  of  too  upectrua  &* 
ooooorod  ojr 

«  -  Jo*  *(o)  do. 

Tbo  oantriOutiooo  of  too  two  parts  of  too  spectra  eon  oo  odd**, 
and  toot  fro*  tfc*  dooorot*  port  Is 

Tor  too  continuous  port  udo&  too  toooroo  t&et  too  j»£    F*(« )  da  - 
jt^ltJJ*   dt  wb*re  ffo)  ood  fim)  aro  fourUr  traoof  rao  «o  Hovo 

*t  •        f*U)f  -  £  ten1  •  h**  *a  ♦  **a*  *  «*♦...! 

l.o* ,  tto  mm  oo  too  desoroto  sootrlootioo.    To*  tatal  a  i»  therefor* 

To  mioiodse  *  «ltO  o  flood  total  eoersjr  per  poise 

oed  with  ooosdoxy  ooodltiooo  •(£)  -  -  0  wo  vast  ooTiooolj 

plooo  oil  too  eoergjr  la  too  first  tere,  o  oooloo  oorto  displaced 

to  oo  tensest  to  too  tUM)  oxio. 


A  « 


Cover  sheet  for  technical  memoranda 

Research  Department 
subject:  A  Mathematical  Theory  of  Cryptography  -  Case  E0878  (  ^0 



i  _  HTfffl-HF-Case  Files 

2  - 

CASE  files 

3  — 


V  » 

4  - 




3.  Black 

6  - 


B.  Llewellyn 

7  - 



8  - 


tf»  Oliver 

9  - 


E,  Potter 

io  - 


B.  H.  Feldrian 

11  - 


C.  Kathes 

12  - 


V.  L.  Hartley 

13  - 


R.  Pierce 

14  - 


W.  Bode 

15  - 


L.  Dietzold 

o     16  - 


A.  MacCall 

17  - 


A.  Shewhart 

J.8  - 


A.  Schelkunoff 

19  - 


E.  Shannon 

20  - 

Dept.  1000  Files 

mm—  45-110-92 
date  September  1,  1945 
author  C.  E.  Shannon 
INDEX  no.  P  0#4 

Dos  mi  saui 


A  mathematical  theory  of  secrecy  systems  is 
developed.    Three  main  problems  are  considered.     (1)  A 
logical  formulation  of  the  problem  and  a  study  of  the 
mathematical  structure  of  secrecy  systems.    (2)  The 
problem  of  "theoretical  secrecy,"  i.e.,  can  a  system  be 
solvod  givon  unlimited  time  and  how  much  material  must 
be  intercepted  to  obtain  a  uniquo  solution  to  cryptograms. 
A  sccrocy  measure  called  tho  "equivocation"  is  defined 
and  its  properties  developed,    (3)  The  problem  of 
"practical  socrocy."    How  can  systems  bo  made  difficult 
to  solve,  ovon  though  a  solution  is  theoretically 

POS8lbl0t      '  •         '  THIS  OOCUKEHT  CO^S^-or^  5g 

STATES  ^^fK  ^ 

LAWS,  TIU.E  I?  RCVEX****1  OF  «J* 

CONTENTS  »N  AN.  »N,lth  TV 

A  Mathematical  Theory  of  Cryptography  -  Case  20878  ((4) 

September  1,  1945 
Index  P0.4 

Introduction  and  Summary    •  BOD  WR 5200.10 

In  the  present  paper  a  mathematical  theory  of     .  .  • 
cryptography  and  secrecy  systems  Is  developed*.  The  entire 
approach  is  on  a  theoretical  level  and  is  intended  to  spmple*  : 
ment  the  treatment  found  In  standard  works  on  cryptography,  * . • , -  V • 
There,  a  detailed  study  Is  made  of  the  many  standard  types  of-^:-  • 
codes  and  ciphers,  and  of  the  ways  of  breaking  tjiea*.   We  will 
be  more  concerned  with  the  general  mathematical  structure,  and 
properties  of  secrecy  systems,  •: .  .-' 

The  presentation  is  mathematical  in  character.  Wo 
first  dofino  the  pertinent  terms  abstractly  and  then  develop 
our  results  as  lcnrias  and  theorems.    Proofs  which  do  not  con- 
tribute to  an  understanding  of  the  theorems  have  been  placed 
in  the  appendix. 

The  mathematics  required  is  drawn  chiefly  from 
probability  theory  and  from  abstract  algebra.    The  reader  is 
assumed  to  have  some  familiarity  with  these  two  fields.  A 
knowledge  of  the  elements  of  cryptography  will  also  be  help- 
ful although  not  required. 

The  treatment  is  limited  in  certain  ways.  First, 
thero  are  two  general  typos  of  secrecy  system;  (x)  conceal-  * 
ment  systems,  including  such  methods  as  invisible  ink,  con- 
cealing a  message  in  an  .innocent  text,  or  in  a  fake  covering   

cryptogram,  or  other  methods  in  which  the  existence; of  the  .  - 
message  is  concealed  from  the  enemy;  (2),  "true"  seorocy  systems  . 
where  the  moaning  of  the  message  is  concealed  by  ciphofr,  code, 
etc.,  although "its  existence  is  not  hidden.    We  oonsider_  only  V 
the  second  type--oonoealment  systems  are  more  of  a  psychological 
than  a  mathematical  problem.    Secondly,  tho  treatment  Is  limited  v 
to  the  case  of  discrete  information,,  whore  tho  information  to 
bo  enciphered  consists  of  a  sequence  of  discrete  symbols,  each  - 
chosen  from  a  finite  set.    These  symbols  may  be  letters  in  a 

*Soo,  for  example,  H.F.Gaines,  "Elementary  Cry^tana^1J(s^oRMAT.oN  w«g 
or  M.  Glvierge,  "Cours  do  Cryptographic. ft;5  TME  katonm-  oi^  w  ^Vvonage 

*    "       person  is  p*«oH»an«>  a* 

-  2  - 

language,  words  of  a  language,  amplitude  levels  of  a  "quantized" 
speech  or  video  signal,  etc.,  but  the  main  emphasis  and  think- 
ing has  beon  concerned  with  the  case  of  letters.    A  preliminary- 
survey  indicates  that  the  methods  and  analysis  can  be  general- 
ized to  study  continuous  cases,  and  to  take  into  account  the 
special  characteristics  of  speech  secrecy  systems. 

The  paper  is  divided  into  three  parts.    The  main  re- 
sults of  these  sections  will  now  be  briefly  summarized.  Tho 
first  part  deals  with  tho  basic  mathematical  structure  of 
language  and  of  secrooy  systems,    A  language  is  considered  for 
cryptographic  purposes  to  bo  a  stochastic  process  which  pro- 
duces a  discrote  sexjuonco  of  symbols  in  accordance  with  some 
systems  of  probabilities.    Associated  with  a  language  there 
is  a  certain  parameter  D  which  wo  call  tho  redundancy  of  the 
language,    D  measures,  in  a  sense,  how  much  a  text  in  tho 
language  can  be  reduced  In  longth  without  losing  any  informa- 
tion. .  As  a  simple  example,  if  each  word  in  a ■t'efcfc' ip  repeated 
a  reduction  of  50 'per  cent  is  immediately  poesi*lcV  .further  4  :  : 
reductions  may  be  possible  due  to  tho  statistical  structure  of  * 
tho  language,  the  high  frequencies  of  cortaih  lottersorv  words,  r 
etc.   The  redundancy  is  of  considerable  importcjido ' ;in;  the  ' study ' 
of  secrecy  systems.  ,  '    /;  ' 

A  secrecy  system  is  defined  abstractly  as  a  sot  of 
transformations  of  one  space  (the  sot  of  possible  messages) 
into  a  socond  space  (the  sot  of  possible  cryptograms).  Each 
transformation  of  the  set  corresponds  to  enciphering  with  a 
particular  key  and  the  transf omations  are  supposed  reversible 
(non-singular)  so  that  unique  deciphering  is  possible  when  the 
key  is  known. 

Each  key  and  therefore  each  transformation  is  assumed 
to  have  an  a  priori  probability  associated  with  it— the  proba- 
bility of  cEoosing  that  key,    Tho  set  of  messages  or  message 
space  is  also  assumed  to  have  a  priori  probabilities  for  tho 
various  messages, .  i.e.,  to  be  a  probability  c^  measiire  space. 

f  ■ 

In  the  usual  cases  the  "messages"  oonsist  of  sequences 
of  "letters.".  In  this  oase  as  noted  above  the  ©essage  space  is 
represented  by  a  stochastio  process  which  generates  sequences  of 
letters  according  to  some  probability  structural ■.  ~:  -  :<p 
.'  •  ,   •  v     '  '       '*•:..-  •'.  -  '••  .  "  • . ,  !  .'     -v  • ,; 

">."  These  probabilities  for  various  keys  and  messages^  are^ 
actually  the  enemy,  crypt  analyst's  a  priori  probabilities  for  / 
the  choices  in  question,  and  represent  his.  aj>rl6rf  knowledge" 
of  the  situation*    Touse  tho  system  a  key  is  first  selected 
and  sent  to  tho  receiving  point.    The  choice  of  6,&©y  determines 
a  particular  transformation  in  tho  set  forming  the^sys torn.  Then 
a  message  Is  selected  and  tho  particular  transformation  applied 
to  this  message  to  produce  a  oryptogram.    This  cryptogram  is 

-  3  -  •HlffflSHflAL 

transmitted  to  the  receiving  point  by  a  channel  that  may  be 
intercepted  by  the  enemy.    At  the  receiving  end  the  inverse 
of  the  particular  transformation  is  applied  to  tho  cryptogram 
to  recovor  tho  original  message. 

If  the  enemy  intercepts  tho  cryptogram  he  can  calcu- 
late from  it  the  a  posteriori  probabilities  of  the  various 
possible  messages  and  keys  which  might  have  produced  this 
*  cryptogram.    This  set  of  a  posteriori  probabilities  constitute 
his  knowledge  of  the  key  and  moss  ago  after  the  interception.* 
The  calculation  of  these  a  posteriori  probabilities  is  the 
generalized  problem  of  cryptanalysis • '  ~  .""  "         ;  \ 

i  * 
As  an  example  of  these  notions,  in  a,  simple  substi- 
tution cipher  with  random  key  there  arc  261  transformations, 
corresponding  to  the  261  ways  we  can  substitute  for  26  dif- 
ferent letters.'  These  are  all  equally,  likely  and  each  there- 
fore has  an  a  priori  probability  l/B&Wz  it  this  is  applied 
to  "normal  English"  the  cryptanalyst  being  assumed  to  have  no 
knowledge  of  tho  message  source  o^hoc  than,, that- it  is  English, 
tho  a  priori  probabilities  of  various  m&jBsageak  Gf  N  lectors' 
.ore  merely  their  frequency  in  normal  JSngiish  iext*  ~ 

If  the  enemy  intercepts  N  letters  of  cryptogram  in 
this  system  his  probabilities  chango.    If  N  is  large  enough 
(say  50  letters)  there  is  usually  a  single  message  of  a  poster 
probability  nearly  unity,  while  all  others  have  a  total  proba- 
bility nearly  zero.    Thus  there  is  an  essentially  unique  "solv 
tion"  to  the  cryptogram.    For  K  smaller  (say  N  «  15)  there  wil 
usually  be  many  messages  and  keys  of  comparable  probability, 
with  no  single  one  nearly  unity.    In  this  case  there  are  multi 
"solutions"  to  the  cryptogram.  ,  ,  - 

Considering  a  secrecy  system  to  be  a  set  of  trans- 
formations of  one  space  into  another  with  definite  probability 
associated  with  each  transformation,  there  are  two  natural  coe 
binlng  operations  v/hi oh  produce  a  third  system  from  two  givon 
systems.    The  first  combining  operation.  Is  called  the  product 
operation  and  corresponds  to  enciphering  the  message  with  the 
first  system  R  and  enciphering  tho  resulting  cryptogram  with 
system  S,  the  keys  for  R  and  3  being  .chosen. ; independently. 
This  total  operation  is  >  secrecy  sjrstcte  "whose  transformations 
consist  of  all  the  products  (in  tho Jusual , sons©  of  products  of 
transformations)  of  transformations  ia  $  with  transformations 
in  R.    The  probabilities  arc  'the  prodticts  of  the" probabilities 
for  tho  two  transformations.    .  .  3.        J§E  .:\  T- 

The  sooond  combining  operation  is  "weighted  addition 

»>  J  T-  - 

T  -  pR  4  qS    .    J  .  p  *  q  «-  1- 

*"Khowlodgo"  is  thus  identified  with 'a  set  of  propositions  hav 
associated  probabilities.    We  are  liero' at  variance  with  the 
doctrine  often  .is sumo d  in  philosophical  studies  which  conside 
knowledge  to  be  a  set  of  propositions  which  are  either  true  o 
fslso.  .  f  ■  :.  v. 



It  corresponds  to  making  a  preliminary  choice  as  to  whether 
system  R  or  S  is  to  be -used  with  probabilities  p  and  q,  respec- 
tively.   When  this  is  done  R  or  S  is  used  as  originally  defined. 

It  is  shown  that  secrecy  systems  with  these  twn  com- 
bining operations  form  essentially  a  "linear  associative  algebra 
with  a  unit  element,  an  algebraic  variety  that  has  been  exten- 
sively studied  by  mathematicians.    Some  of  the  properties  of 
this  algebra  are  developed. 

Among  the  many  possible  secrecy  systems  there  is  one 
type  with  many  special  properties.  This  type  we  oall  a  "pure" 
system.  A  system  is  pure  if  for  any  three  transformations  T, . 
T.t  Tk  in  the  set  the  product  1 

TiVV  . 

is  also  a  transformation  in  the  set,  and  all  keys  are  equally 
likely.    That  is  enciphering,  deciphering,  and  enciphering  with 
any  throe  keys  must  be  equivalent  to  enciphering  with  some  key. 

With  a  pure  cipher  it  is  shown  that  all  keys  are 
essentially  equivalent—they  all  lead  to  the  same  set  of  a 

posteriori  probabilities.    Furthermore,  when  a  given  cryptogram 
is  intercepted  there  is  a  set  of  messages  that  might  have  pro- 
duced this  cryptogram  (a  "residue  class"/  and  the  a  posteriori 
probabilities  of  messages  in  this  class  ore  proportional  to  the 
a  priori  probabilities.    All  the  information  the  enemy  has  ob- 
trinod  by  intercepting  the  cryptogram  is  a  specification  of  the 
residue  class.    Many  of  the  common  ciphers  are  pure  systoms, 
including  simple  substitution  with  random  key.    In  this  case 
the  residue  class  consists  of  all  messages  with  the  same  pattern 
of  letter  repetitions  as  the  intercepted  cryptogram, 

Two  systems  R  and  S  are  defined  to  be  "similar"  if 

there  exists  a  fixed  transformation  A  with  an  inverse,  A"1  such 

'      .  R  «  AS  .  ,  ~ 

■  *  'J 

If  R  and  S  are  similar,  a  one-to-one  correspondence  between  the 
resulting  cryptograms  can  be  set "up  leading  to  the  same  a  poste- 
riori probabilities.    The  two  systoms  are  cryptnnalyticaTly  the 
samo ,  v  . »  . 

The  second  main  part  of  tho  paper  deals  with  tho  prob- 
lem of  "thooretical  security."    How  secure  is  a  system  again: 
cryptanalysis  when  the  enemy  has  unlimited  time  and  manpower 
available  for  tho  analysis  or  intercepted  cryptograms? 

"Perfect  Secrecy*  is  defined  by  requiring  of  a  system 
that  after  a  cryptogram  is  intercepted  by  the  enemy  the  a  pos- 
teriori probabilities  of  this  cryptogram  representing  various 
messages  be  identically  the  same  as  the  a  priori  probabilities 
of  the  same  messages  before  the  interception.    It  is  shown  that 
perfect  secrecy  is  possible  but  requires,  if  the  number  of 
messages  is  finite,  the  same  number  of  possible  keys--if  the 
messago  is  thought  of  as  being  constantly  generated  at  a  given 
"rate"  R,  (to  be  defined  later),  key  must  be  generated' at  the 
same  or  a  greater  rate* 

If  a  secrecy  system  "with  a  finite  key  is  used,  and  N 
letters  of  cryptogram  intercepted,  there  will  be,  for  the  enemy, 
a  certain  set  of  messages  with  certain- probabilities,  that  this 
cryptogram  could  represent.    As  N  Increases  the  field  usually  . 
narrows  down  until  eventually  there  is  a  unique  "solution'*:  to 
the  cryptogram — one  message  with  probability  essentially  unity : 
while  all  othors  are  practically  zero.    A  quantity  OJN)  is  de- >'  .:  \ 
fined,  called  the  equivocation,  which  measure^  lii  n  statistical  v 
way  how  near  the' average  cryptogram  of  H  letters  is  to  a  unique 
solution;  that  is,  how  uncertain  the  enemy, is  of  the  original;  -  - 
message  after  intercepting  a  cryptogram  of  N  letters.  Various 
properties  of  the  equivocation. are  deduced — for  example,  the 
equivocation  of  the  key  never  incroasos  with  increasing  N. 
This  quantity  Q  ia  s  theoretical  secrecy  index — theoretical  In 
that  it  allows  the  enemy  unlimited  time  to  analyse  the  cryptogram 

The  function  Q(N)  for  a  certain  idealized  type  of 
cipher  called  the  random  cipher  is  determined.    With  certain 
corrections  this  function  can  be  applied  to  many  cases  of  practi- 
cal interest.    This  gives  a  way  of  calculating  approximately 
how  much  intercepted  material  is  required  to  obtain  a  solution 
to  a  secrecy  system.    It  appears  from  this  analysis  that  with 
ordinary  languages  and  the  usual  types  of  ciphers  (not  codes) 
this  "unicity  distance"  is  approximately  |K|/D.    Here  |K|  is  a 
number  measuring  the  "size"  of  the  key  space.  : If.  all  keys  are 
a  priori  oqually  likely  |K|  is  the  logarithm  of  the  number  of 
possible  keys.    D  is  the  redundancy  of  the  language  and  measures 
the  excess  information  content  of  tho  language.    In  simple  sub- 
stitution with  random  key  on  English  |K|  isltW)  261  or  about  ,  /  . 
£0  and  D  is  about  .7  for  English.  ■  Thus  unicity  occurs  at  about  .. 
30  letters.  _  *'  '    .        _  >.  ;J;V^a'V''VY.  ' 

It  is  possible  to"  oonstruct  secrecy . systems  with  a 
finite  key  for  certain  ""languages"  in  which  the  function  ft(N) 
does  not  approach  zero  as  N      «©»  -  In  this  case,  no  natter  how  . 
much  material  is  intercepted,  the  enemy  still  does  not  got  a.,  — 
unique  solution  to  the  cipher  but  is  left  with  many  alterna- 
tives, all  of  reasonable  probability.    Such  systems  we  call 
ideal  systems.    It  is  possible  in  any  language  to  approximate 
such  behavior — i.e..,  to  make  the  approach  to  zero  of  Q(N)  recede 

-  6  - 


out  to  arbitrarily  large  N.    However,  such  systems  have  a 
number  of  drawbacks,  such  as  complexity  and  sensitivity  to 
errors  in  transmission  of  the  cryptogram. 

The  third  part  of  the  paper  is  concerned  with  "prac- 
tical secrecy."    Two  systems  with  the  same  key  size  may  both 
be  uniquely  solvable  when  N  letters  have  been  intercepted,  but 
differ  greatly  in  the  amount  of  labor  required  to  effect  this 
solution.    An  analysis  of  the  basic  weaknesses  of  secrecy  sys- 
tems is  made.    This  leads  to  methods  for  constructing  systems 
which  will  require  a  large  amount  of  work  to  solve*    A  certain 
incompat ability  among  the  various  desirable  qualities  of 
secrecy  systems  is  discussed, 

\  - 



1.    Choice,  Infornatlon  and  Uncertainty 

Suppose  we  have  a  set  of  possible  events  whose  proba- 
bilities of  occurrence  are  p,,  pg,   ...  ,  p_.    Those  probabilities 
are  known,  but  that  is  all  we  know  concerning  which  event  will 
occur.    Can  we  define  a  quantity  which  will  measure  in  some 
sense  how  ^uncertain"  we  are  of  tho  outcome?    How  much  "choice" 
is  involved  in  the  selection  of  the  event  by  the  chance  element  . 
that  operates  with  those  probabilities?    We  propose  as  a  numer- 
ical measure  of  this  rather  vague  notion  the  quantity 

.     ,n    "  :  .      '  :'  . 

H  «  -    Z    pA  log  pA*  » 

There  are  many  reasons  for  this  particular  formula.  Quantities 
of  this  kind  appear  continually  in  the  present  paper  and  in  the 
study  of  the-  transmission  of  information. 

To  justify  this  definition  wo  will  state  a  number  of 
properties  that  follow  from  it.    Those  properties  will  not  be 
provod  here,*  but  are  easily  deduced  from  the  definition. 
Properties  of  H  *  -  2  p^  log  p^. 

1.  H  =  0  if  and  only  if  all  the  p.^  but  one  are  zero,  this 

one  having  the  value  unity.    Thus  only  when  we  are  certain 
of  the  outcome  does  H  vanish. 

2.  For  a  given  n,  H  is  a  maximum  and  equal  to  log  n  if  and 
only  if  all  the  p,  are  equal  (i.6.  l/n) .    This  is  also 
intuitively  the  most  uncertain  situation. 

3.  Suppose  there  are  two  events  in  question,  with  m  possi- 
bilities for  tho  first  and  n  for  tho  second.    Lot  p^^  be 

the  probability  of  tho  joint  occurrence  of  i  for  tho  first 
and  j  for  the  second.    The  uncertainty  of  the  joint  event  ?•. 

is  -  . 

H  "  "  I J  Pi^  l0g  PiJ  •  • 

For  given  probabilities  p^^  ■  Z  p.  .  for  the  first  and 

*  It  is  intended  to  develop  these  results  in  coherent  fashion 
in  a  forthcoming  memorandum  on  the  transmission  of  informa- 
tion. ' 

qj  »  S         for  the  second,  tho  quantity  H  is  maximized  if 

ond  only  if  the  events  are  independent,  i.e.,  p^.  =  Pi^j  * 
This  maximum  value  is  the  sum  of  the  individual  uncertainties 

H —  Hx  *  Hg 

»  -^S  pj  log  Pj^  -  2        log  q j  ♦ 

These  facts  can  bo  generalized  to  any  number  of .different 

events,  >  ^       %  . 

Suppose  there  are  two  chance  events  A  and  B  as  in  3.  not 
necessarily  independent.  We  define  the  mean  conditional 
uncertainty  of  B,  knowing  A  as    -  ••• 

BTA(B)  -  2  p{A)  HA(B> 

where  HA(B)  is  the  uncertainly  of  B  when  A  has  a  definite  A 

value  A.    Thus  ^(B)  is  the  average  uncertainty  of  B  for 

all  different  events  A,  weighted  according  to  their  differ- 
ent probabilities  of  occurrence c    The  uncertainty  of  tho 
joint  event  is  the  sum  of  the  uncertainty  of  the  first  and 
the  mean  conditional  uncertainty  of  the  second.    In  symbols 

H(A,B)  -  H(A)  +  HA(B) 

This  is  true  whether  or  not  thero  are  any  casual  connections 
or  correlations  between  the  two  evonts. 

In  the  same  situation  the  uncertainty  of  B  is  not  greater 
than  the  joint  uncertainty  H{A,B), 

H(B)  <  H(A,B) 

The  equality  holds  if  and,  only  if  every  B  (of  prdbability /~; 
greater  than  zero)  is  consistont  with -only  one  A.    That - 
is,  if  A  is  uniquely  determined  by  B.  • 

From  properties  3  and  4  wo  have  .  ..  r-  .* 

H(A)  +  H(B)  >  H(A,B). 

H(B)  >  H(A,B)  -  H(A) 

=  H(A)  +  HA(B)  -  H(A) 

H(B)  >  H,(B) 


Thus  tho  uncertainty  of  B  is  not  greater  than  its  avoragc 
value  when  we  know  A.    Additional  information  never  in- 
creases average  uncertainty.    The  equality  holds  if  and 
only  if  A  and  B  are  independent. 

Suppose  we  have  a  set  of  probabilities  plf  pg,  pn# 

Any  change  toward  equalization  of  these  (supposing 'them 
unequal)  increases  H.    Thus  if  p^  <  pg  and^wo  Increase  p^, 

decreasing  pg  an  equal  amount  (to  keep  the  sum  2  p^  con* 

stant  at  unity)  so  that  p^  and  pg  aro  more  nearly  equal, 

then  H  increases .  More  generally  if  v/e  perform  any  rtaver- 
aging  "  operation  on  the  pj,,  of  tho  form  ' 



a  permutation  of  tho  p.  with  H  of  course 
samc^.  3 

where  2  a^j  *  1  and  all  a^  >  0,  then  H  increases  (except 

in  tho  special  case  where  this  transformation, amounts  to 
no  more  than 
remaining  the 

...  • 

H  measures  In  a  certain  sense  how  much  "information  is  ' 
generated"  when  the  choice  is  made.    Suppose  such  a  chance 
event  occurs  and  we  wish  to  describe  which  of  the  n  possi- 
ble events  took  place •    The  average  amount  of  paper  re- 
quired to  down  in  a  properly  chosen  notation  is 
in  the  cases  of  interest  to  us,  about  proportional  to  H. 
Thus  there  might  be  10^0  «■  1Q50  possible  events,  with 


■  10"" 3^  and 

of  them  having  a  pr 
probability  of  ^  .1CT50.    We  could  set  up  a  notational  sys- 
tem to  describe  which  event  occurs  as  follows*    We  number 

the  events  from  1  up  to  10*^  +  1050  and  when  one  occurs  - 
write  down  the  corresponding  number.    The  average  amount 
of  paper  required  will  be  proportional  to  the  overage 
number  of  aigits  we  need.    This  will  bo  nearly  30  If  the'li.  /iy 
event  Is  in  the  first  group  of  lO30,  and  about  50  If  In  the'  "/*;/ 
second  group.    Thus  the  average  number  of  digits,  is  about 
40.    We  also  have         ,"•     -  V 

K*  -10' 
*  40 

30  |  ip-ftf-iog  ficT50 

-  10 

9-.  Although  tho  last  result  is  only  approximately  true  vtf 
the  number  of  choices  is  finite  it  becomes  exactly  tri. 
when  an  unlimited  sequence  of  choices  is  made.  Thus  3 
a  sequence  of  N  independent  choices  is  made  each  choic 
being  from  n  possibilities  with  probabilities 
p^,  Pgi  ••*»  Pn  then  the  total  amount  of  information 

genoratod  is 

H  ■  -  N  Z  Pjl  log  pj 

;    If  N  is  sufficiently  large,  the  expected  number  of  dif 
required  to  register  tho  particular  choice  made  is  arl 
trarily  close  to  H,  providing  the.  correspondence  betwc 
-   sequences  of  digits  and  sots  of  choices  is  correctly  r 
.  If  incorrectly  made  it  will  be  greater  than  H-.  Moreo\ 
./V  if  n  is  sufficiently  largo  tho  probability  of  needing 
more  than  H  digits  is  very  small*    -  \    / .  , 

10*    It  can  be  shown  that  if wo  requlro^oejrtiairi  reasonable 
"properties  of  a  measure  o^choioot^H^ncertainty  then 
formula  -  S.p^  log  pA  necessarily  follows*    These  roqv 

properties  and  the  proof  of  this  statement  are  given  i 
Appendix  It  The  chief  property  is  that  tho  measure  be 
a  sense  additive— if  a  choice  be  decomposed  into  a  sei 
of  choices  the  total  choice  is  the  sun  (properly  weigl 
of  the  individual 'choice*.    .  ^ 

II,    Finally  we  note  that  quantities  of  the  type  2       log  j 

have  appeared  previously  as  measures  of  randomness,  pr 
larly  in  statistical  mechanics.  Indeed  the  H  in  Boltr 
H  theorem  is  defined  in  this  way,        being  the  probabi 

of  a  system  being  in  cell  i  of  its  phase  space.  Most 
the  entropy  formulas  contain  terms  of  this  type. 

■  ■■■■■■■■  -  ♦,"-''-\ 
Tho  base  which  is  used  in  taking  logarithms  in  the  for 
amounts  to  a  choice  of  the  unit  of  measure. v  If  the  base  is 
we  will  call  the  resulting  units  "digits;"  if  the  base  is  t 
the  .units  will  be  oallod  Halternativps.^  i- One  digit  is  nbou 
alternatives.  A' choice  from  1000  equally  likely  possibilit 
is  3  digits  or  about  10  alternatives.    .  , 

2.    Language  as  a  Stochastic  fepcess>  6  v  • 

A  natural  language,  such  as  English,  can  be  studi 
from  many  points  of  view — lexicography,  syntax*  semantics, 
history,  aesthetics,  etc.  The  only  properties  of  a  languag 
of  interest  in  cryptography  are  statistical  properties.  Wh 
are  the  frequencies  of  the  various  letters,  of  different  di 
(pairs  of -letters),  trigrams,  words,  phrases,  etc.?    What  i 

the  probability  that  a  given  word  occurs  in  a  certain  mossag 
The  "cleaning"  of  a  message  has  significance  only  in  its  in- 
fluence on  those  probabilities.    For  our  purposes  all  other 
properties  of  language  can  be  omitted.    We  consider  a  langur. 
therefore,  to  be  a  stochastic  {i.e.  a 'statistical)  process  w 
generates  a  sequence  of  symbols  according  to  some  system  of 
probabilities.    The  symbols  will  be  the  letters  of  the  langu 
together  with  punctuation,  spaces,  etc.,  if  these  occur. 

Conversely  any  stochastic  process  which  produces  a 
discrete  sequence  of 'symbols  will  be  said  to  be  a  language. 
This  will  include  such  cases  as:  ,  ,  , 

1.  •  Natural  written  languages  such  as  English,  German,  Chine 

S%    Continuous  information  sources  that  have  been  rendered 
discrete  by  some  quantizing  process,:.  Tor  example.,  the 
quantized  speech  from  a  PCM  transmitter,  or  a  quantized 
•television  signal*  *  .. 

3.  "Artificial"  languages,"  where  we  merely  defiae  abstract  1 
a  stochastic  process  which  generates  a  sequence  of  symbc 
The  following  are  examples  of  artificial  languages. 

(A)  Suppose  wo  have  5  letters  A,  B,  C,  D,  E  which  are 
chosen  each  with  probability  .2,  successive  choicer 
being  independent.    This  would  lead  to  a  sequence  c 
which  tho  following  is  a  typical  example. 


This  was  constructed  with  the  use  of  a  table  of  rar 
numbers,*  •.:'<• 

(B)  Using  the  same  5  letters  lot  the  probabilities  be 
.4,  .1,  .2,  .2,  .1  respectively,. with  successive 
choices  independent.-  A  typical  "text"  in  this 
language  is  thoni  .     '    ;1^fC>      '  '    ^ '.; 

""'    '  a  A  A  C  D  C  B  D  C  E  A  A  D  A  D  A  C  E  D  A  ' 

v    .     f  ;  J; 'v  i  A  P  CA  BE  D  A  D  D  CE;0  A  AAA  A  D 

■(C)  A  more  complicated  structure  is  obtained  "if  succesi 
letters  are  not  chosen"  independently  but  their  prot 
bilities  depend  on  preceding  lottors.    In  the  simpj 

*  Kendall  and  Smith,  "Tables  of  Random  Sampling  Numbers," 
Cambridge,  1939. 

-  18  - 

case  of  this  type  a  choice  depends  only  on  the 
preceding  letter  and  not  on  ones  before  that.  The 
statistical  structure  can  then  be  described  by  a 
set  of  transition  probabilities  p^j),  the  probabi" 

that  letter  i  is  followed  by  letter         The  indices 
i  and  j  range  over  all  the  letters  in  the  language 
A  second  equivalent  vrny  of  specifying  the  structur 
is  to  give  the  digran  probabilities  p(i,j),  the  re! 
tive  frequency  of  the  digram  1  j  in  the  language. 
The  letter  frequencies  pTi),  (the  probability  of 
letter  i),  tho  transition  probabilities  p^j)  and  1 

digram  probabilities  p(i,j)  are  related  by  the  foi: 
ing  formulas,,       ,  ~     "■• .  ~. 

pfi)  -3  p(j,,J)  -2  p(j,i)  ~  Z  p(jWlj'- 

' .  :.  t.J  ,,,  x y  .       j    ■  3  : 
;:         -  P(i)  %M  J^^^xl 2|J 
i  p1(ji  -|p(i)  -      p(i  j)  *  i  % 

As  a  specific  example  suppose  there  are  three  lettt 
A,  B,  C  with  the  probability  tables: 




B  C 



,e  .2 

i  B 


•5  0 

c ; 


.4  a 














i  B 








A  typical  text  ^in,  this  language  is  the  following. 

A  B  B  ABA  B  A  B. A  B  A  B  A  B'B  B  ABB  B  B  B  A  B 
k  ;B  A  B  A  BAB  B  B  A  C  A  C  A  B  B  A  3  B  B  3  A  B  B 
A>  A  C  B  B  B  A  B  A      \.  " 

The  next  increase  in  complexity  would  involve  trigr 
frequencies  but  no  more*    The  choice  of  a  letter  wc 
depend  on  the  preceding  two  letters  but  not  on  the 
text  before  that  point.    A  set  of  trigram  frequonci 


p(i,j,k)  or  equivalently  a  set  of  transition  prob: 
bilities  Pjj(k)  would  bo  required.    Continuing  in 

this  way  one  obtains  successively  more  complicate; 
stochastic  processes.    In  the  general  n-gram  case 
a  set  of  n-gram  probabilities  p(i^,  ig,  •  in) 

or  of  transition  probabilities  p,  ,  ^ 

11    H>  Vl 
is  required  to  specify  the  statistical  structure, 

(D)    Stochastio  processes  can  also  be  defined  which  prt 
duce  a  text  consisting  of  a  sequence  of  "words. " 
Suppose  there  are  5  letters  A,  B,  C,  D,  E  and  16 
"words"  in  the  language  with  associated  probabilii 

'  .10  A         .16  BEBE  -  .11  tJABED  -  3  .04  DEB 

'  .04  ADEB  •  .04  BED  .  .  .05  CEED  ,  »15  DEED 

'  .05  ADEE  •  .02  BEEP  -  3  .08  DAB  '     V  >•  01  EAB 

*:  .OX  BADD  •  .05  CA  *  .04  DAD"  v  ?  i  .05  EE  ^ 

Suppose  successive  "words"  are  cndseii  Independent: 
and  are  separated  by  a  space.    A  typical  message 
might  be: 


If  all  the  words  are  of  finite  length  this  process 
is  equivalent  to  one  of  the  preceding  type,  but  t: 
description  may  be  simpler  in  terms  of  the  word 
structure  and  probabilities.  We  may  al3o  general: 
here  and  introduce  transition  probabilities  betwee 
words,  etc.,  ^       I,  - 

•  .>.  "  i 

These  artificial  languages  are  useful  in  construe 
simple  problems  and  examples  to  illustrate  various  posslbil 
V£e  can  also  approximate  to  a  natural  language  by_  moans  of  c 
series  of  simple  artificial  languages*  The  aero  order  appr 
mation  is  obtained  by  choosing  all  letters  with  the  seme  pr 
bility  and  Independently.  The  first  order  approximation  is 
obtained  by  choosing;  successive  letters  independently  but  e 
letter  having  the  same  probability  that,  it  does  in  the  natu 
language,.  .Thus  in  the  first  order  approximation  to  English 
is  chosen  with  probability  .12  (its  frequency  in.  normal  Eng 
and  W  with  probability  .02^'but  there  is  no  influence  betwe 
adjacent  letters  and  no  tendency  to  form  the  preferred  digr 
such  as.TH,  .ED,  etc.  In  the  second  order  approximation  dig 
structure  is  introduced. . 'After  a  letter  is  chosen,  the  nex 

one  is  chosen  in  accordance  with  the  frequencies  with  which 
the  various  letters  follow  the  first  one.    This  requires  a 
table  of  digram  frequencies  p^(jj,  the  frequency  with  which 

letter  j  follows  letter  i.  In  the  third  order  approximatio: 
trigram  structure  is  introduced.  Each  letter  is  chosen  wit 
probabilities  which  depend  on  the  preceding  two  letters. 

3.    The  Series  of  Approximations  to  English 

To  give  a  visual  idea  of  how  this  series  of  proce; 
approaches  a  language,  typical  sequences  in  the  approximate 
to  English  have  been  constructed  and  are  given  below*  In  a: 
cases  wo  have  assumed  a  27  symbol  "alphabet t ho  26  letter; 
and  a  space.      -        "  ,., 

1.  Zero  order  approximation  {symbols  independent  and  equ: 

probable);-'.-,  *  •'•^./,.         '  '  '■,         \.  ."  t 


2.  First  order  approximation  (symbols  independent  but  wit 
frequencies  of  English  text).  y 


3.  Second  order  approximation  (digram  structure  as  in  En( 


4.  Third  order  approximation  (trigram  struoture  as  in  Eng 


5m  1st  Order  Word  Approximation."  Rather  than  continue  wi 
.  .  •  tetragram,  n-gram  structure,  it  is  easier  and  bett 

to  jump  at  th^a  point  to  ..word  units.    Here  words  are 
chosen  independently  but  with  their  appropriate  fro que 





6.    End  Order  Word  Approximation.    The  word  transition 
probabilities  are  correct  but  no  further  structure  is 


The  resemblance  to  ordinary  English  text  increase 
quite  noticeably  at  each  of  the  above  steps*    Note  that  the 
samples  have  reasonably  good  structure  out  to  about  twice  t 
range  that  is  taken  into  account. in  their  construction*  Th 
in  (3)  the  statistical  process  Insures  reasonable  text  for 
letter  sequence,  but  four-letter  sequences  from  the  sample 
usually  bo  fitted  Into -good  sentences,.  .  In  (6)  sequences  of 
or  more  words  can  easily  be  placed  in  sentences  without  unu 
or  strained  constructions >   Tfio  particular  sequence  of  ten 
words  "attack  on  att- English  writer  that  .the  charaoter  of  th 
Is  not.  at  all  unreasonably.  *»^***         •  '--       ^  ^ 

The  first  two  samples  were  constructed  by  the  use 
a  book  of  random  numbers  in  conjunction  for  (2)  with  a  tabl 
of  letter  frequencies.  This  method  might  have  been  continu 
for  (5),  (4),  and  (5),  since  digram,  trigram,  and  word  freq 
tables  ore  available,  but  a  simpler  equivalent  method  was  u 
To  construct  (3)  for  example  ono  opens  a  book  at  random  and 
selects  a  letter  at  random  on  the  page.  This  letter  is  re- 
corded* The  book  is  then  opened  to  another  page  and  one  re 
until  this  letter  is  encountered.  The  succeeding  letter  is 
then  recorded.  Turning  to  anothor  page  this  second  letter  : 
searched  for  and  the  succeeding  letter  recorded,  etc*  A  si: 
process  was  used  for  (4),  (5),  and  (6).  It  would  be  lnterc 
if  further  approximations  could  bo  constructed,  but  the  lab 
involved  becomes  enormous  at  the  next  stage*  •  , 

The  stochastic  process  6  is  already  sufficiently  c 
to  English  for  many  cryptographic  purposes  since  most  crypt- 
analysis  is  based  on  "local"  structure  of  not  more  than  two 
three  words  in  length.'  .  '  ~ 

.  -    ■  .  :;    s    ;       •  . 

4*.  Graphical  Representation  of  a  Markoff  Process 

Stochastic  processes  of  tho  type  described  above  r 
known  mathematically  as  discrete  Karkof f  processes  and  have 
been  extensively  studied  in  the  literature**    $ho  general  ci 

ysi-:  .'A   

*  For  a  detailed  treatment  see  M.  Frochet,  "Methods  des  fon 
arbitraires.  Theorie  des  enSnements  en  chaine  dans  le  ca: 
d'un  nombro  fini  d'etats  possibles."  Paris,  Gauthier-Vill 
1938.  ~ 

16  - 

can  be  described  as  follows.  There  exist  a  finite  number  c 
possible  "states"  of  a  system;  S1,  Sg,  . ..,  Sn»    In  additic 

there  is  a  set  of  transition  probabilities;  q^j)  the  probe. 

bility  that  if  the  system  is  in  state  S±  it  will  next  go  tc 

state  Sy    To  make  this  Markoff  process  into  a  language  ger. 

tor  we  need  only  assume  that  a  letter  is  produced  for  each 
transition  from  one  state  to  another*    The  states  will  corr 
spond  to  the  "residue  of  influence"  from  preceding  letters. 

The  situation  can  be  represented  graphically  as  s 
in  Figs.  1,  2,  3  and  4.  .  The  "states"  are  the  junction  poir. 
in  the  graph  and  the  probabilities  and  letters  produced  for 
transition  are  given  beside  the  corresponding  line.  Fig.  1 
for  the  example  B  in  Section  2,  while  Fig,  2  corresponds  tc 
example  C.  In  Fig.  1  there"  ijs  only  ono  stato  since  success 
letters  ere  independent*  In  Fig»  2  there  are  as  many  state 
as  letters.    If  a  trlgram  example  wero  constructed  there  wc 

be  at  most  n  states  corresponding  to  the  possible  pairs  of 
letters  preceding  the  one  being  choson.  Figs.  3  and  4  shov: 
graphs  for  the  case  of  word  structure  in  example  D.  In  the 
S  corresponds  to  the  "space"  symbol.  In  Fig.  3  each  word  h 
a  separate  chain  of  branches  from  the  left  to  the  right  juii 
point,  while  in  Fig.  4  the  branches  have  been  combined,  sic 
fying  the  graph. 

5.    Puro  and  Mixed  Languages 

As  we  have  indicated  above  a  "language"  for  our  p 
poses  can  be  considered  to  bo  generated  by  a  Markoff  proces 
Among  the  possible  discrete  Markoff  processes  there  is  a  gr 
with  special  properties  of  significance  in  cryptographic  wc 
This  special  class  consists  of  the  "ergodic"  processes  and 
shall  call  the  corresponding  languages  "pure  languages."  A 
though  a  rigorous  definition  of  an  ergodic  process  is  somev; 
involved,  the  general  idea  is  simple.  In  an  ergodic  proces 
every  sequence  produced  by  the  process  is  the  same  in  stati. 
tical  properties.  Thus  the  letter  frequencies >  digram  fre- 
quencies, etc.",-  obtained  from  particular  sequences  will,  as 
lengths  of  the  sequences  increases,  approach  definite  limit, 
independent  of  the  particular  sequence.  Actually  this  is  n 
true  of  every  sequence  but  the  sot  for  which  it  is  false  ha; 
probability  zoto.  Roughly  the  ergodic  property  means,  stati; 
tical  homogeneity,  - 

.    «  -  •  •••  •  /  -  --iV-r  , 

v  ('  -        "  .       .  . 

All  the  examples  of  artificial  languages  given  ab 
are  pure,  the  corresponding  Markoff  process  being  ergodic. 
This  property  is  related  to  the  structure  of  the  correspond 
graph.    If  tho  graph  has  two  properties  the  language  it  gen 
will  bo  pure.    These  properties  ore: 

1.  The  graph  cannot  be  divided  into  two  parts  A  and  B  su 
that  it  is  impossible  to  go  from  junction  points  in  r. 
A  to  junction  points  in  part  B  along  lines  of  the  gra 
in  the  direction  of  arrows  and  also  impossible  to  go 
from  nodes  in  part  B  to  nodes  in  part  A, 

2.  A  olosed  series  of  lines  in  the  graph  with  all  arrows 
on  the  lines  pointing  in  the  same  orientation  will  be 
called  a  "circuit."  The  "length"  of  a  circuit  is  the 
number  of  lines  in  it.  Thus  in  Fig.  4  the  series  BEE 
is  a  circuit  of  length  4.  The  second  property  requir 
is  that  the  greatest  common  divisor  of  the  lengths  of 
all  circuits  in /the  graph  be  one,   :  \  - 

If  the  first  condition  is  satisfied  but  the  secon 
one ( violated  by  haying  the  greatest  common  divisor  equal  to 
d  >  1,  the  sequences  have  a  certain  type  of  periodic  struct 
The  various  sequences  fall  into  d  different  classes  which  a: 
statistically  the  same  apart  from  a  shift  of  the  origin  (i.. 
which  letter  in  the  sequence  is  called  letter  1)  V»  By  a  shi: 
of  from  0  up  to  d  -  1  any  sequence  can  be  made  statisticall 
equivalent  to  any  other.  A  simple  example  with  d  =  2  is  th- 
following.  There  are  three  possible  letters  a.  b,  c.  Lettc 
a  is  followed  with  cither  b  or  c  with  probabilities  ±  and  £ 

3  3* 

respectively.  Either  b  or  o  is  always  followed  by  letter  a 
Thus  a  typical,  sequence  is 

abncacacabacababacac.  . 
This  typo  of  situation  is  not  of  much  importance  for  our  woi 

If  the  first  condition  is  violated  the  graph  may  1 
"separated"  into  a  set  of  subgraphs  each  of  which  satisfies 
first  condition.  We  will  assume  that  the  second  condition  2 
"  also  satisfied  for.  each  subgraph.  We  have  in  this  case  what 
may  be  called  a  ''mixed"  language  made  up  of  a  number  of  pure 
components.  .  The  components  correspond  to  the  various  subgrc 
If  **1»         ^3*         D:ce  ^ne  component  languages  we  may  write 

>  t  -  p^  ♦  p^2  *  p3%  ♦  *y->f\ 

where  pA  is  the  a  priori  probability  of  the  component  langut 

•  ■  -  j   . 

Physically  the  situation  represented  is  this.  The 
are  several  different  languages  1^,  1^,  Lj,  which  are  e 

of  homogeneous  statistical  structure  (i.o.,  they  are  pure 
languages).    We  do  not  know  a  priori  which  is  to  be  used,  bu 
once  the  sequence  starts  in  a  given  pure  component        it  cor. 

-  18  - 

indefinitely  according  to  the  statistical  structure  of  that 
component.    Wo  do  havo,  however,  a  set  of  a  priori  probabilities 
for  tho  various  components,  p^,  pg,  . 

As  an  example  one  may  take  two  of  the  artificial 
languages  defined  above  and  assume  p^  =  .2  and  p2  »  .8.  A 

sequence  from  tho  mixed  language 

L  »  .2  1^  +  ,.8  Lg 

would  be  obtained  by  choosing  first       or  Lg  with  probabilities 

.2  and  .8  and  aftor  this  choice  generating  a  sequence  from 
whichever  was  chosen*  - 

A  natural  language,  such  as  English  or  German,  is 
not,  of  course,  pure.    Different  kinds  of  text,  literary, 
newspaper ,  technical  or  military,  display  consistently  differ- 
ent types  of  structure.    Those  differences  are  small,  however, 
in  comparison  with  the  differences -between  different  natural 
languages.    If  only  local  structure— letter,  -digram  and  trigram 
frequencies,  for  instance — is  of  much  importance,  it  is  reason- 
able to  consider  "normal  English"  to  be  nearly  pure. 

6.    Information  Rate  and  Redundancy  of  a  Language 

Suppose  we  have  a  pure  language  L  produced  by  a  given 
Markoff  process.    Associated  with  the  language  there  are  certain 
parameters  which  are  of  significance  in  questions  of  trans- 
forming the  language  and  in  cryptography.    The  most  important 
of  these  is  what  we  will  call  the  "information  rate"  R  for  the 
language.    It  measures  the  rate  at  which  the  Markoff  process 
"generates  information,"  as  determined  by  the  measurement  of 
the  amount  of  choice  available  on  tho  average  per  letter  of 
text  that  is  produced.    In  Section  1  we  deflnod  the  amount  of 
choice  when  there  ore  various  possibilities  with  probabilities 
Pl»  P2i  *V,  Pn  as 

H  ■  ■  2       log  Pi  • 

In  a  Markoff  process  with  a  number  of  different  ^states"  there 
will  be  a  choice  value  ft^  for  each  of  these  states  and  a  proba- 
bility of  being  in  each  of  the  states  (or  a  frequency  with  which 
this  state  occurs)*    If  this  relative  frequency  for  state  i  is 
P*,  the  average  amount  of  choico  Is 

R  -  Z  Pi  ^ 

summed  over  all  the  states.    This  is  tho  definition  of  the 

information  rate  for  the  language.  If  p^(j)  is  the  probability 
of  producing  letter  J  when  in  state  i  we  have 

^  -2  Pi(j)  log  Pi(jJ 

the  sun  being  over  all  tho  letters  in  the  language.  Thus 

R  -   Z   Pt  Pitj)  log  ptU) 

Tho  infornation  rate  R  has  the  units  of  alternatives 
(or  digits)  per  letter  sinoe  it  neasures  the  average  amount  of 
choice  por  letter  of  text  that  is  produced, 

.  A  second  parameter  of  importance  is.  the  "maximum  rate" 
RQ  for  the  source.    This  is  defined  simply  as  the  logarithm  of 

the  number  of  different  letters  in  the  language.    RQ  is  also 

measured  in  alternatives  or  digits  per  letter.    If *  successive 
letters  are  chosen  independently  and  each  letter  is  equally 
likely  RQ  «  R.    Otherwise  we  have  R  <  RQ. 

R  and  RQ  are  actually  two  limiting  cases  of  informa- 
tion rates  for  the  language.    R    may  be  said  to  be  the  rate 

when  no  statistical  structure  is  taken  into  consideration  and 
R  is  the  rate  when  all  the  structure  is  taken  into  account. 
Between  these  there  is  an  infinite  series  of  rates  R*f-  Rg, 

RQ,  •••  which  take  some  of  the  statistical  structure  into 

account.    R^  takes  the  letter  frequencies  into  account  and  is 

defined  by 

%  «  L  p(i)  log  p(i) 

..  -  * 

where  p(i)  is  the  probability  of  letter  i.    R2  takes  digram 
structure  into  account  and  is  def inod  by 

R2r-2  p(I)'p1(J)  log  Pl(J) 

where  the  p(i)  are  letter  probabilities  and  pjJJ)  the  ^transition 

probabilities,  i»e.,  tho  probability  of  letter  i  being  followed 
by  letter  J;    In  general  we  define 

*n  "Z  P<*i»  h*  W  Piifg    V  d(in) 

lOg    P±     4  *  (i_) 

X\H        *n-l  n 

where  tho  sum  is  on  all  indices  i, ,  •       i_  and  p<     •  ••  . 

1  ^         .'I  1n-l 

is  the  probability  of  (n-1)  gram  i-^  •*»  i^^  with 

pi  ^n^  tho  I^^abillty  of  this  n-1)  gram  being  folio; 

1  n-1 

by  letter  i^.  ^  may  be  called  tho  n-gram  information  rate  fc 
the  language.    It  can  be  shown  that 

.  Ro>Rl>R2  ^  Roo  "R 

These  rates  determine  how  much  a  language /can  be  "compressed" 
in  length  by  a  suitable  oncoding  process*    A  language  with 
maximum  rate  Rq  and  rate  R  can  be  transformed  in  such  a  way 
that  a  sequence  of  letters  N  letters  long  is  transformed  into 
a  sequonco  of  letters  only  N*  letters  long  where 

IV  RA  «  N  R 

(This  is  approximate  and  only  exactly  true  in  the'limit  as 
N  -+  oo  .)    Thus  tho  information  is  "compressed"  in  th6  ratio 


This  is  the  greatest  compression  ratio  possible.    It  makes  use 
of  all  the  statistical  structure  of  the  language.    If  only 
n-gram  structure  is  made  use  of,  a  compression  ratio 

is  the  best  possible. 

The  compression  obtained  in  this  way  is  only  a 
statistical  gain.    Some  infrequent  sequences  are  encoded  into 
much  longer  sequences  while  the  more  probable  ones  go  into 
shorter  sequences  so  that  on  the  average  the  length  is  de- 
creased.   It  is  the  type  of  compression  obtained  in  telegraphy 
by  using  the  shortest  telegraph  symbol,  a  single  dot,  for  the 
most  froquont  letter  E,  while  uncommon  letters  Q,  Z,  etc,  arc 
encoded  into  longer  telograph  symbols.    An  average  reduction 
in  time  of  transmission  is  obtained  but  there  are  possible 
soquencos,  e.g.,  Q  Q  Q  *  »  t,  which  require  much  longer* 
_»     ■   ■  • 

Performing  'a  transformation  on  a  language  L  which 
compresses  as  much  as  possiblo  will  be  called  reducing  t  to 
a  "normal"  form.    When  this  has  been  done  it  can  be  shown 
that  all  letters  in  the  output  are  equally  likely  and  inde- 
pendent.   Actually  to  realize  this  transformation  would  usuall 

21  - 

r>nT  TTT IHF1  TTXj  "I 

require  an  infinitely  complex  machine,  but  we  can  always  ap- 
proximate it  as  closely  as  desired,  with  a  machine  of  finite 

Tho  quantity 

D  =  RQ  -  R 

will  bo  called  the  redundancy  rate  of  the  language.     It  meas 
the  excess  information  that  is  sent  if  sequences  in  the  lang 
arc  transmitted  in  their  original  form  (without  compression 
reduction  to  normal  form).    Correspondingly  thero  is  a  whole 
series  of  redundancy  rates: 

Do  -  Ro  -  V 
Dp  -  R,   -  R? 

ej  x  m 

D    =  R    -  R 
n       o  n 

D    =  Rc  -  R 

is  the  redundancy  rats  due  to  n-gram  structure  in  the 
language . 

The  redundancy  D  can  also  be  said  to  measure  the 
amount  of  statistical  structure  in  the  language.    If  the  se- 
quence is  purely  random  D  =  0  whilo  at  the  other  extreme  if 
each  letter  is  completely  determined  by  preceding  letters  wit 
no  freedom  of  choice,  D  has  its  maximum"  possible  value  RQ.  3 
is  sometimos  convenient  to  use  the  "relative"  redundancy  D/Rc 
which  must  lie  between  0  and  10C#.    •  ; 


If  we  hnvo  a  source  of  rate  R,  maximum  rate  R  (bot 
in  digits  per  letter)  and  consider  the  possible  sequences  of 
letters  these  fall  into  two  groups  for  N  large.     One  group  ol 
"high  probability"  sequences  contains  about 



sequencGS  (where  we  have  assumed  R  measured  in  digits  per  letter). 
All  of  those  have  substantially  the  same  logarithmic  .probability. 

The  remainder  of  the  total  of  10*°*  possible  sequences  are  of 
very  small  probability.    In  fact  thoir  total  probability  ap- 
proaches zero  as  N  increases .    The  logarithm  of  the  probability 
of  an  individual  sequence  in  the  high  probability  group  is  thus 
about  -RN.    In  a  procise  statement  of  these  results  we  must  allow 
a  certain  fuzzincss  in  R,  i.e.,  replace  R  by  R  ±  e  whore  e  -*  0 
as  N  -*  oo  « 

Reduction  of  a  language  to  normal  form  is  performed 
by  properly  matching  tho  probabilities  of  sequences  to  the 
length  of  the  corresponding  sequences  in  the  normal  form.  The 
"high  probability"  sequences  are  translated  into  short  sequences 
and  tho  remainder  into  longer  sequences. 

_  An  example  will  clarify  tho  results  we  have  given. 

Let  the  language  contain  4  lotters  A,  B,  C,  D.  In  a  soquenoe 
successive  lotters  are  chosen  independently,  the  four  letters 

having  probabilities  ^,  ^,  |,  £,  respectively.    Vie  have 
rq  m  iog2  4-2  alternatives/letter 


1         11         12  1 
Rl  *  R2  "  %  "         "  R  "  "  (2  log  t  +  4  loe  4  + 8  los  8"} 


*  I  +  I  *  I  **  4  alternatives/letter 

By  a  suitable  transformation  the  average  length  of  sequences 

can  bo  reduced  by  tho  factor  ^/2  -  7/8.    A  transformation  to  do 

it  is  the  following.    First  wo  translate  into  a  sequence  of 
binary  digits  (0  or  1 )  by  the  following  table 

A  0 

B  10 

-                       C  110 

D  111 

After  this  pairs  of  the  binary  digits  aro  translated  into  the  • 
original  alphabot  as  follows 

00  '  A1 

01  B» 

10  C» 

11  D« 

-  23  - 

For  a  typical  scquonco  this  works  out  as  shown  below: 

0  10  110  0  10  0  110  10  10  111  0    0    111    0    111  0 

Regrouping  and  translation  back  into  letters: 

01  01  10  01  00. 11  01  01  01  11  00  11  10  11  10 
.     B«  B»  C«  B»  A»  V  B'  B«  B»  D«  A*  D«  C»  D'  C 

In  this  case  there  are  16  letters  in  the  original  and  15  in 
final  text.  Thus  due  to  the  snail  redundancy  and  the  short 
of  the  text  only  part  of  tho  saving  is;  evident*  .  In  a  long 

hoivever  the  full  reduotion -of  g  would  appear*  ,  This  nay  be 

verified  directly  in  this  cose.  In  a  long  text  of  N  letter 
each  letter  will  appear  with  about  its. appropriate*  *requenc 
Thus  the  nuriber  of  binary  digits  will  be  about 

N[|  •  l  +  J-2+|«3+^-3]  ■  J  N 

since  each  A  gives  one  binary  digit,  each  B  gives  two,  etc. 
nuriber  of  letters  in  the  final  text  is  half  this  since  each 
pair  of  binary  digits  goes  into  ono  letter.    Thus  the  re due 

is  by  a  factor  Z  . 


It  is  also  easy  to  seo  in  this  case  that  the  bina 
digits  are  equally  likely  and  independent,  and  fron  this  th 
tho  final  text  letters  are  also* 

This  situation  is  nore  coriplicated  for  nixed  long 
and  we  shall  not  enter  into  it  here*  Wo  nay  note,  however, 
that  if 

L  -jpfo*  •'»••  ♦  PnIfc  : 

whore  1^  is  pure  with  rate  R^f  then  the  long  sequences  of 

fall  into  (n+1)  groups^  The  first  n  groups  correspond  to  t: 
pure  conpononts.    Thpse  in  gr  oup  1  nunber  about  - 

and  have  logarnithic  probability  about 

24  - 

^■'H  M,  ||  |  | 

Tho  last  group  contains  all  other  sequences  and  has  a  snail 
total  probability* 

7,    Redundancy  Characteristic  of  a  Language 

The  form  of  the  curve  D(N)  as  a  function  of  N  na; 
called  the  redundancy  characteristic  of  the  language.  In  : 
rough  way  it  describes  the  way  in  which  the  redundancy  appt 
In  Fig.  5  several  types  of  characteristics  are  shown,  all  i 
the  same  final  redundancy.  The  way  in  which  this  approach 
is  of  importance  in  cryptography.  For  languages  which  reac 
final  redundancy  at  one  or  two  letters  (Curves  1  and  2)  one 
of  cipher  (ideal  ciphers)  can  be  used.  For  those  which  rer 
near  zero  out  to  fairly  large  N  (like  Curve  5)  another  type 
appropriate.  Natural  languages  are  apt  to  show  a  character 
more  like  3,  and  this  makes  them  difficult  to  encipher  witi 
security  by  simple  means.      ■  . 

-  Examples ; 

1.  A  language  in  which  successive  letters  are  independer 
but  with  different  probabilities  has  a  characteristic 
Type  1. 

2.  Consider  a  language  constructed  as  follows.  First  sc 
268  different  sequences  of  letters,  each  16  letters  1 
from  tho  2616  possible  sequences  of  this  length.  Th: 
should  be  a  random  selection.  The  16-letter  sequence 
chosen  aro  the  "words"  of  tho  language.  Messages  arc 
random  sequences  of  those  "words."  Such  a  language  1 
a  characteristic  like  the  Curve  5, 

3.  A  language  with  digram  structure  only,  such  as  Exampl 
in  Section  2  above,  has  a  characteristic  of  the  Type 
Fig.  5,  reaching  its  final  value  at  N  =  2. 

4.  English  has  the  characteristic  3  in  Fig.  5. 


The  redundancy  characteristic  describes  how  the 
structure  in  the  language  is  spread  out.    If  the  structure 
localized,  tho  curve  rises  rapidly  to  its  final  value.  If 
there  are 'long  range  influences  the  asymptotic  value  is  ap- 
proached more,  slowly.    If  the  structure  is  "locally  random" 
the  curve  will  romain  near  zoro  for  small  N. 

8.    Secrecy  Systems 

Before  we  can  apply  any  mathematical  analysis  to 
secrecy  systems,  it  is  necessary  to  idealize  the  situation 
suitably,  and  to  define  in  a  mathematically  acceptable  way 
what  v«e  shall  mean  by  a  secrecy  system.    A  "schematic"  -diagram 
of  a  general  secrecy  system  is  shown  in  Fig.  6.    At  the  trans- 
mitting end  there  are  two  information  sources — a  message  source 
and  e  key  source.    The  key  source  produces  a  particular  key  from 
among  those  which  are  possible  in  the  system.    This  key  is  trans- 
mitted by  some  means,  supposedly  not  intercept ible ,  e.g.  by  mes- 
senger, to  the -receiving  end.    The  message  source  produces  a 
messnge  (the  "clear")  which  is  enciphered,  end  the  resulting 
cryptogram  sent  to  the  receiving  end  by  a  possibly  interceptible 
means,  for  example  radio.    At  the  receiving  end  the  cryptogram 
and  key  are  combined  in  the  decipherer  to  recover  the  message. 

Evidently  the  encipherer  performs  a  functional  opera- 
tion.   If  M  is  the  message,  K  the  key,  and  E  the  enciphered  mes- 
sage, or  cryptogrrm,  we  have 

I  -  f(M,  K) 

i.e.  E  is  r  function  of  M  end  $«    We  prefer  to  think  of  this, 
however,  not  as  n  function  of  two  variables  but  as  n  (one  para- 
meter) family  of  operations  or  trcnsforma tions ,  and  we  write  it 

E  -  T,M.  . 

The  transformation  T,  applied  to  message  M  produces  cryptogram  E. 
The  index  i  corresponds  to  the  particular  key  being  used.  If 
there  are  m  possible  keys  there  will  be  m  transforations  in  the 
family         Tg,  ......  Tffi, 

At  the  receiving  end  it  must  be  possible  to  recover 
M ,  knowing  E  and  X.    Thus  the  transform  tions  in  the  family 
must  have  unique  inverses 

M  -  Tf 1  E 

at  any  rate  this  inverse  must  exist  uniquely  for  every  E  which 
can  be  obtained  from  an  M  with  key  i. 

The  key  souroe  can  be  thought  of  as  a  "probability 
machine,"  something  which  chooses  from  the  possible  keys  ac- 
cording 'to  a  system  of  probabilities.    Mathematically  then,  the 
keys  (or  the  parrmeter  of  the  family  of  transformations)  belong 

26  - 


to  q  probability  or  measure  spree.    Hence  we  r-rrive  rt  the 

A  secrecy  system  is  o  family  of  uniquely  reversible 
transformations  T,  of  r  message  spree  ^  into  0  cryptogam 
spr.ce.Tl_,,  the  parameter  i  belonging  to      a  probability  CL.. 
Conversely  any  set  of  entities  of  this  type  will  be  called  a  * 
"secrecy  system."   .  . 

The  system  can  be  visualized  mechanically  as  a 
machine  with  one  or  more  controls  on  it-  '  A  sequence  of  letters, 
the  message,  is  fed  into  the  input  of  the  machine  and  a  second 
series  emerges  at  the  output.    The  particular  setting  of  the 
controls  corresponds  to  the  particular  key  being  used.  Some 
method  must  be  prescribed  for  choosing  the  key  from  all  the 
possible  ones* 

To  make  the  problem  mathematically  tractable  we  shall 
assume  that  fthe  enemy  knows  the  system  being  used*    That  is,  he 
knows  the  family  of  transformations  T,,  and  the  probabilities 
of  choosing  verious  keys* 

One  might  object  to  this  as  being  unrealistic,  in  that 
the  cryptanalyst  often  does  not  know  whet  system  was  used  or  the 
probabilities  of  vrrious  keys.    There  are  two  answers  to  this 

1.  The  resumption  is  rcturlly  the  one  ordinarily  used 
in  cryptogr-phic  studies.    It  is  pessimistic  and 
hence  s-:fe,  but  in  the  long  run  realistic  (particu- 
larly in  military  work),  since  one  must  expect  his 
system  to  be  found  out  eventually  through  espionage, 
captured  equipment,  prisoners,  etc.    Thus,  even  when 
an  entirely  new  system  is  devised,  so  thot  the  enemy 
crnnot  rssign  rny  a_  priori  probability  to  it  without 
discovering  it  himself,  one  must  still  live  with  the 
expectation  of  his  eventual  knowledge,  • 


2.  The  restriction  Is  much  weeker  thrn  appears  at  first, 
due  to  our  broad  definition  of  what  constitutes  the 
system.    Suppose  a  cryptographer  intercepts  a  message 
and  does  not  know  whether  a  substitution,  transposi- 
tion, or  Vigenere  type  cipher  was  used*    He  can  con- 
sider this'  as  being  enciphered  by  e  system  in  which 
part  of  the  key  la  the,  specification  of  which  of  these 
types  was  used,  the  next  part  being  the  particular 
key  for  that  type.    These  three  different  possibil- 
ities are  assigned  probabilities  according  to  his 
best  guesses  of  the  a  priori  probrbilit ies  of  the  en- 
cipherer using  the  respective  types  of  cipher. 

-  27  - 

cwiui' mum 

A  second  possible  objection  to  our  definition  of 
secrecy  systems  is  that  no  account  is  taken  of  the  common 
practice  of  inserting  nulls  in  a  message  and  the  use  of  mu 
tiple  substitutes.    Thus  there  is  not  a  unique  E  ■  T,  M,  t 
actually  the  encipherer  can  choose  at  will  among  a  number 
different  E's  for  the  same  message  and  key.    This -situatic 
could  be  handled,  but  would  only  add  complexity  at  the  pre 
stage,  without  altering  any  of  the  basic  results.    To  defi 
the  more  general  secrecy  system,  one  would  add  a  second  pa 
meter  to  the  transformations  T,,  which  corresponds  to  the 
various  choices  of  cryptograms  corresponding  to  a  given  me 
sage  and  key.    It  is  possible,  but  not  always  desirable,  t 
consider  this  second  parameter  as  part  of  the  key,  since  i 
does  not  need  to  be  transmitted  to  the  receiving  point. 

We  elsO  assume  that  the  enemy  is  in  possession  o 
measure  in  the  space  0M,  the  a  priori  probabilities  of  var 
messages.  The  same  object ion"~and  essentially  tho  same  ans 
might  be  given  to  this  assumption  as  to  his  knowledge  of  t 
transformations  T*.  This  measure,  however,  we  do  not  cons 
rs  part  of  the  secrecy  system  for  reasons  which  wITl  apper 
later.  The  secrecy  system  whose  transformations  are  T.  wi 
be  denoted  by  T  and  this  concept  includes  the  space  or. 
which  T  operates  (without  its  measure ),  the  trans formation 
r-nd  the  spaces  Ojr  and  "i^,,  the  former  with  its  probabili 


If  the  messages  are  produced  by  ?  M-rkoff  proce? 
of  the  type  described  previously,  the  probabilities  of  vrx 
messages  are  determined  by  the  structure  of  the  M^rkoff  pr 
For  the  present,  however,  we  wish  to  t^ike  a  more  general  t 
of  the  situation  rnd  regard  the  messages  as  merely  an  abst 
set  of  entities  with  associated^. probabilities ,  not  necess' 
composed  of  a  sequence  of  letters  and  not  necessarily  prod 
by  a  M^rkoff  process. 

It  should,  be  emphasized  that  throughout  tne  pape 
secrecy  system  means  not  one  but  a  set  of  many  transformat 
After  the  key  is  chosen  only  one  of  these  transformations 
used  and  we  might  be  led  to  define  a  secrecy  system  as  a  s 
transformation  on  a  language.*  The  enemy,  however,  does  r. 
know  what  key  was  chosen  and  the  "might  have  been"  keys  ar 
important  for  him  as  the  actual  one*  Indeed  it  is  only  tfc 
exi stance  of  these  other  possibilities  that  gives  the  syst 

*A.  A*  Albert  in  a  paper  presented  at  a  Manhattan,  Kansas, 
meeting  of  the  American  Mathematical  Society  (Nov.  22,  If 

•  entitled  "Some  Mathematical  Aspeots  of  Cryptography has 
defined  a  ciphering  system  in  this  way.  With  this  limite 
definition  about  all  one  can  do  is  to  describe  and  class; 
from  the  mathematical  point  of  view  various  types  of  trar 

28  - 

any  secrecy.'  Since  the  secrecy  is  our  primary  interest, 
are  forced  to  this  rather  elaborate  concept  of  a  secrecy 
system.    This  type  of  situation  where  possibilities  are  t 
important  as  actualities  is  almost  the  rule  in  games  of 
strategy.    The  course  of  a  chess  game  is  largely  control! 
by  threats  which  are  not  carried  out.    See  also  the  "vir: 
existence"  of  unrealized  imputations  "in  von  Neumann's  the 
of  games. 

There  are  a  number  of  difficult  epistemologica 1 
questions  connected  with  the  theory  of  secrecy,  or  in  fac 
with  any  theory  which  involves  questions  of  probability 
(particularly  a  priori  probabilities.  Bayes*  theorem,  etc 
when  applied  to  a  physical  situation.    Treated  abstractly 
probability  theory  can  be  put  on  a  rigorous  logical  basis 
with  the  modern  measure  theory  approach**    As  applied  to 
reality,  however,  especially  when  "subjective*  probabilit 
and  unrepec table  experiments  are  concerned,  there  are  mar. 
questions  of  logical  validity.    For  example  in  the  appror 
to  secrecy  made  here,  a  priori  probabilities  of  various  k 
are  assumed  known  by  tEe  enemy  cryptographer — bow  can  one 
determine  operationally  if  his  estimates  are  correct,  on 
basis  of  his  knowledge  of  the  situation? 

It  may  happen  thrt  the  keys  are  chosen  by  the 
cipherer  according  to  one  system  of  probabilities,  i.e.  c 
measure  in  the  key  space  0„  nnd  that  the  enemy  cryptanaly 
estimates  a  second  different  system  of  probabilities  fl£  i 
this  space  which  ere  entirely  reasonable  in  the  light  e 
his  knowledge  of  the  situation —  which  is  correct?      I  be 
lieve  that  both  correct.'    The  calculation  besed  on  Clj, 
leads  to  the  solution  when  the  enemy  knows  just  how  the 
keys  pre  chosen  r  nd  the  solution  .based  on  ^  leads  to  sol 
tions  which  are  correct  for  a  situation  agreeing  with  the 
enemy's  knowledge  of  the  actual  situation.    It  rppears  in 
tuitively  that  the  enemy's  lock  of  knowledge  can  only  do 
him  harm,  and  probably  this  can  be  proved,  but  this  quest 
has  not  been  investigated*    In  fact,  we  assume  only  one 
measure  ^  in  the  key  spaoe*    Similar  remarks  may  be  made 
regarding  measure  in  the  messrge  space  Ow. 

*See  J»  L.  Doob,  "Probability  as  Measure,"  Annals  of  Math 
Stat .\  v,  12,  194J.,  pp.*206-2U. 

A..  Kolmogoroff ,  "Grundbegrif fe  der  W^hrscheinlichkeits 
Rechnung,"  Ergebn'isse  der  Mr.thenetic,  v,2,  No*  3  (Berlin 
1933).  - 

-  29 


Actually  In  practical  situations,  only  extrec 
errors  in  P  priori  probabilities  of  keys  and  messages  cau 
much  error""in  the  important  parameters.    This  is  because 
the  exponential  behavior  of  the  number  of  messages,  etc, 
and  the  logarithmic  measures  employed. 

With  regard  to  the  application  of  the  m^ theme 
theory  of  probability  to  physical  situations  there  are  tv. 
main  theories  or  ways  of  setting  up  the  correspondence. 
The  frequency  theory-   .Probability  is  correlated  with  re 
frequency  of  an  event*   .This  Is  the  correspondence  used  t 
the  practicing  statistician,  in  principle  by  the  physic is 
etc.  (2)  The  degree  of  belief  approach.    .Probability  is  a 
subjective  phenomena  and  measures  one's  degree  of  belief 
the  occurrence  of  on  event*   .This  approach  is  seen  often 
the  work,  of  historians,  Judges,  and  in  everyday  life.  Al 
though  this  latter  approaoh  has  of ten  been  attacked  as  me 
less  we  cannot  agree  with  this  opinion.    In  the  first  pie 
the  intuitive  approach  can  be  given  a  rigorous  mothematic 
f«tuv4stion»   .  This  has  been  done  in  *  very  elegont  way  by 
B.  0.  Koopmen.*    Essentidly  one  need  only  assume  that  a 
be  capable  of  making  probability  judgments  (Event  A  is  m: 
less  probable  than  event  B  or  they  are  equiprobable)  and 
his  judgments  be  self  consistent  (e.g.  if  he  judges  A  mor 
probable  than  B  end  B  more  probable  than  C  he  should  jud£ 
more  probable  than  C).    One  can  even  establish  numerical 
by  the  use  of  a  "standard  gauge,"  for  example  a  roulette  v, 
and  thus  relnte  the  subjective  and  the  frequency  probabil 
In  the  second  place,  on  progmatlc  grounds  one  can  hardly 
the  subjective  applications ,  since  almost  all  of  our  ever 
decisions  are  based  on  this  sort  of  probability  judgment. 
Cryptographic  work  involves  both  types  of  applications, 
the  use  of  frequency  tables,  significance  tests  etc.,  the 
crypt-nalyct    is  following  the  frequency  approach.    In  th 
"intuitive"  methods  of  cryptanalysis    (probable  words  etc 
degree  of  belief  approach  is  more-  in  evidence*  » 

We  may  remark  that  e  single  operation  on  a 
language  which  is  reversible  forms  a  degenerate  type  of  e 
system  under  our  definition— a  system  with  only  one  key  r 
unit  probability-  Such  a  system  has  no  secrecy — the  cryi 
analyst  finds  the  message  by  epplying  the  inverse  of  this 
transformation,  the  only  one  in  the  system, -  to  the  interc 
cryptogram*    The  decipherer  and.  cryptanaiyst  in  this  case 

*B.  0.  Koopman,  "The  Axioms  and  Algebra  of  Intuitive 
Probability,"  Annals  of  Mathematics,  v. 41,  no. 2,  1940, 
p. 269.    "Intuitive  Probabilities  and  Sequences,"  v. 42, 
no.l,.  1941,  p. 169. 

-  30 

fiflPr    I  IT  I  l 

possess  the  ssme  inf ormation.  In  gonerr.l,  the  only  differ 
between  the  decipherers  knowledge  on3  the  enemy  cryptanal 
knowledge  is  that  the  decipherer  knows  the  pnrticul^r  key 
used,  while  the  cryptanalyst  only  knows  the  b  priori  pr->bc 
ities  of  the  various  keys  in  the  set.  The  process  of  deci 
ing  is  that  of  applying  the  inverse  of  the  particular  tror. 
formation  used  in  enciphering  to  the  cryptogram.  The  proc 
of  cryptenalysis  is  that  of  Attempting  to  determine  the  me 
(or  the  particular  key)  given  only  the  cryptogram  find  the 
a  priori  probabilities  of  various  keys  and  messages  * 

A  system  will  be  celled  fc^oaed"  if  any  possible 
cryptogram  can  be  deciphered  with  any  possible  key.  This 
that  the  inverse  transformations  T~l  are  ell  defined  for  e 
element  in  the  cryptogram  -spaoe.  1 

7/e  shPll  use  the  notation  |m|  for  the  "size"  of 
message  space:       ;  ../ 

X*  •  ImI-  *•£  P(M)  log  P(M) 

where  P(M)  is  the  probability  of  message  M  end  the  sum  is 
all  messages  of  just  N  letters.    Thus  \U\  is  a  function  of 
and  measures  the  amount  of  "choice"  in  the  selection  of  an 
letter  message.    F  or  large  N,   |M|  is  approximately  RN. 
Similarly  Ik]  is  the  size  of  the  key  space 

IkI  -  -  2  P(K)  log  P(K) 

the  sum  being  oyer  all  keys. 

9.    Representation  of  Systems 

^  A  secreoy  system  can  be  represented  in  various 
One  which  is  convenient  for  illustrative  purposes  is  a  lin 
diagram,  as  in. Figs.  7,  10,  11.    The  possible  messages  are 
represented  by  points  at  the  left  end  the  possible  cryptog: 
by  joints  at  the  right.    If;a  certain  key,  say  key  1,  tran 
forms  messnge  Mg  into  cryptogram  E .  then  M«  and  E.  are  con- 
nected by  a  line  ilabeled  lf  etc»    From  eacn  possible  messn 
there  must  be  exactly  one  line  emerging  for  epch  different 


A-  second  representation  is  by  means  of  a  rectant 
array.    This  may  be  done  in  three  different  ways*    For  the 
closed  system  of.  Fig.  7,  the  three  arrays  are  as  follows: 

-  31  - 




m\.  1 

El  E4  E2 

E3  El  E4 

E4  E3  E1 

E2  E2  E3 



E»    Eo.  E 
2      3  4 

.  K 







E  \ 
















transforms  %  Into  E-z  and  either  ?^£Vjt0  E§  by  key  3*  No 
From  the  third  E3  is^e^ipherel  hi  kL Vf^H  M4  ^to  Sa. 
arrays  and  the  l?ne  diagram  contain  !Lf  *?  gfVf  M3'    A1*  ofSthese 
any  one  the  others  can  be  derived,    equivaleGt  informs tion-from  , 

'     *  .  .  •  >  •  ^   •    _  .  •  *• . 

transform^^in^  describe  the  set  of  ^ 

bilities  of  various  ke?s  mS;  ai«  £pec}fy  tlle  system  the  proba- 
by  merely  listing  the  kevHftS       be  eivfn'    This  mW  ^  done 
Similarly  the  melsagl  SSbl  1?  not  Probabilities" 
the  probabilities  of  the  va^^^S •^.SSJ*1*  ^ 

the  set  oAZsfor^oL8 W\e?  18  t0  desc1^ 
forms  .on  the  message  for  an LhUl^      8t  °Per,2tions  one  per- 
grsm.    Similarly  one  d??iJes         f  X  6Lto    ybtr-in  the  crypto- 
various  keys  by  describing  how  Tklv  £         Probabilities  ?™  . 
of  the  enemy's  habits  of  kJv-  ilh««f  7  ^ ohosen,  or  what  we  know 
messages  are  Implicit  detL^0  The  Probabilities  tor 

knowledge  of  tha  e^mvL  ?       ined  by  stating  our  a  priori 
tion  (wflch  will  Since ^r^nh^^3'  th*  ^otToaTSfluB,  " 
and  any  special  inSiVwl  fi^Es 

.  ,«ajr  uave  regarding  the  cryptogram. 

10.  Notation 



The  following  notetioa  „m  generally  be  followed, 
the  encipher&d  message  or  cryDtourrm 

t%Zll&&\Tctnls  -S^SSW  probabilUlee,  .  ^ 
SbXi^W*  ProbaMlitles.  also  4 

3    »  the  cryptogram  space,  also  a  probability  space,  sine- 
the  probabilities  in  3L,  and       induce  probabilities 
CL/.for  each  cryptogram, 

m,  ■  the  i      letter  of  the  message 
e^  *  the  i'tti  letter  of  the  cryptogram 

k^  «  the  itn  letter  of  the  key  when  it  can  be  so  describe 

Generally  P  stands  for  a  probability-  Conditional 
probabilities  are  indicated  with  subscripts;  Thus 

P(M.)  "  probability,  of  message  M 
P(E)  ■  probability  of  cryptogram  E 
P(K )  <■  probebility  of  key  K  .  • 

PM(E)  -  conditional  probability  of  ,E  if  message  M  is  chos 
Eg(M) :'.»  conditional  probability  of     if  cryptogram E  is 

intercepted,-  i*e#  the  a  posteriori  probability  of 
•    if  E  Is  observed*  "  "    O'    ,  *        ■  ■ 
Q    *  equivocation,  a  concept  to  be  defined  precisely  It 
which  measures  the  uncertainty  of  some  ~ knowledge  c 
fined  only  by  probabilities.    We  also  hr>ve  condit 
equivocations,  thus  Q^(K)  is  the  equivocation  of  ■ 
key  knowing  the  message. 
|k|    «  -  L  P(K)  log  P(K)  the  size  of  the  key  space 

\n\    •»  -  E  P(il)  log  P(M)  the  size  of  the  message  space 

[e|    •  -  E  P(E)  log  P(E)  the  size  of  the  cryptogram  space 

m  *  number  of  different  keys 
N  *  number  of  intercepted  letters 
RQ  »  mr-ximum  information  rate  for  a  language 

R  «  mean  rate 

JX  *  R 0  -  R  ■  redundancy  of  a  language 
T,  R,  S,  etc.  ■  secrecy  systems 

T*,  R»«  S,,  etc*  »  particular  transformations  of  these 


11  * 

Some  Examples  -of  Secrecy  Systems 

In  this  section. a  number  of' examples  of  ciphers  ^ 
be  given*  These  will' often  be  referred  to  in  the  remeinde: 
the  paper  for  illustrative  purposes*  " ;      * ' 

'.  " '   ■ 

1.    Simple  Substitution  Cipher. 

'■  \  -,. 

In  this  cipher  each  letter  of  the  message  is  repl 
by  a  fixed  substitute,  usually  Elso  a  letter.'    Thus  the  me: 

M  *.  m^  nig  m^  m4  » . . 

*  33  * 

be  cranes 

el  e2    3  4 

K*S^S««  x'u  ?he  IbstttuiV  AT  0  is  the  substitut 

for  B.,  etc*  "  •      v.  ,  •  ..  .  » 

2,    Transposition  {Fixed  Period  dV  •         -  V 

The  is  divided  into  groups  of  length  d-.nd  a 

the  second  group,  etc\r!?*P*??£  first  d  integers-    Thus  fc 

that  mx  m2  m3  m4  ag  m6       nig       m10  oeco 

^  ^  m5  n4  m?  ^  *6  ^  mg  ...    4    Sequential  npplic* 

tion  of  two  or  mor,  transpositions  will  be  c.Ued  compound 
imposition.    If  the  periods  are  *1^V  1    Stow  d  i.< 

thrt  the  result  is  a  transposition      of      perioa  a, 
the  least  comon  multiple  of         dg,  d3,  V  v 

3.    Vigenere,  rnd.  Variations*  ■ 

In  this  cipher  the  key  consists  of  a  series  of  d 

A  «  0  to  Z  -  25).  Thus 

e^,  »       <*  fc^  i mod  26}  J 
where  k«  is  of  period  d  in  ithe  Index  U  \f 
For  example  with  the  key  G  A  H  we  obtain 

message  N  0  W  I  S  T  H  E  <*  ,  -  . 

repeated  key  G  A  H  G  AH  G  A  #  *  * 

cryptogram  _         T  0  D.  0  SANE-*** 

The  Vigenere  of  period  \}«  •^^"5"  xs'alvonced  a' 


may  be  any  number  from  0  to  25.    The  so  oexxe*  o 

-  34  - 

V-ri^nt  Beaufort  r,re  similrr  to  the  Vigenere,  end  encipher  by 
the  equations 

el  *  ki  -        (mod  26) 

ei  *  mi  "  ki  ^mod  26  ^ 

respectively.    The  Be°,ufort  of  period  one  is  called  the 
reversed  Caeser  cipher. . 

The  application  of  two  or  more  Yigenfires  in  sequence 
will  be  called  the  oompound  Vigenere.  '  It  has  the  equation 

...  *  j  , 

ei  *  mi  +  kl  *  *i     ****  *  *i  (mod 

'      .  •    •  .  .    .  >  -  ■'«- .  ....         ,    ,  -  v.,,..  :-   •  • 

where         1^,  *..,       in  general  have  different  periods P 

•  •  •'      '    "'>'•■  •'    ■  ■■  '■     .  n&;    '/  •  • ■ 

The  period  of  their  sum         •  « 

<  .  *   *  * « 

ki  +  *i  +         *  si 

as  in  compound  transposition,  is  the  least  common  multiple  of 
the  individual  periods. 

4.  Vernam  System** 

When  the  Vigenere  is  used  with  an  unlimited  key, 
never  reperting,  we  h°ve  the  Vernam  system,  with 

ei  *  mi  *  ki  ^mod 

the  k,  being  chosen  at  random  and' independently  among  0,  1, 
25.  If  the  key  is  a  meaningful  text  we  have  the  "running 
key"  cipher. 

.  •  ' 

5.  Bazeries  Cylinder. 

.    ,>.'■-■-  ••  ■„  ;      •  'j  •        • » -v '  ,..«•■< 

In  this  mechanical  system  25  thick  disks  are  used,  - 
each  having  a  mixed  alphabet  stamped  around  the  edge.  These 
disks  can  be  arranged  in  any  order  on.a  spindle,'  and  the  par- 
ticular arrangement  used  constitutes  the  key.'    With  the  disks 
in  their  proper  order;  a  message, is- enciphered  by  turning  the 
disks  so  that  the  message  appears* on  a,. line -.parallel  to  the 
axis  of  the  spindle*    Any. other  line  of  letters  may  then  be 
chosen  for  the  cryptogram.   'To  decipher^  the  cryptogram  is 
arrenged  on  a  line  end-  the  decipherer  looks  for  another  line 
which  then  makes  sense.  — 

*G.  S.  Vernam,  "Cipher  Printing  Telegraph  Systems  for  Secret 
Wire'  and  Radio  Telegraphic  Communications.''  Journal  Ameri. 
Inst,  of  Elect.  Eng.,  Vj  ,'XLVy  p#,  !  109-115,  1926. 

6,    Digram,  Trigram,  rnd  N-gram  substitution. 

Rather  than  substitute  for  letters  one  cnn  substi 
for  digrams,  trigr^ms,  etc.  Genercl  digram  substitution  i 
quires  n  key  consisting  of  a  permutation  of  the  262  digrar 
It  can  be  represented  by  a  table  in  which  the  row  correspc 
to  the  first  letter  of  the  digram  and  the  column  to  the  se 
letter,  entries  in  the  table  being  the  substitutes  (usuall 
also  digrams)* 

7*    Interrupted  Key  Vigenere.  , 

The  Vigenere  and  its  variations  can  be  used  with 
interrupted  key* •  The  sequence  of  key  letters  is -started  e 
at  irregularly  spaced  points* 7  Thus^  if  the  entire  key  sec 
isXPGH*  TRS>  one  can  Interrupt  irregularly  to  get 

X  .P  OH  F  TI  H  X  P  Gfi  ?  lE'XPlPO  »  •  • 

The  points  of  interruption  can  be  determined  in  various  wt 
(1).  Whenever  a  certain  letter  occurs  in  the  clear »•  (£). 
Whenever  a  certain  letter  occurs  in  the  cryptogram.  (3.)  / 
interrupting  letter,  say  J,  can  be  reserved  as  a  signal  ar 
the  encipherer  Interrupts  the  key  at  his  discretion,  (4). 
signal  is  used  end  the  decipherer  loontes  the  interruption 
by  the  appearance  of  meaningless  text  in  the  decipherment, 
In  place  of  starting  the  key  again  at  ecoh.  interruption  or 
can  omit  letters  of  it  or  reverse  the  direction  of  progrer 
There  ere  many  variations  and  combinations  of  these  methoc 

8.    Single  Mixed  Alphabet  Vigenere. 

This  is  a  simple  substitution  followed  by  a 


e^  »  f (n^)  +  kj 

•  ■ 

The  "inverse"  of  this  system  is  a/Vigenere  followed  by  sir 

e .  ■»  g(m4  *  k«) 

.1,  i       i  . 

mi  r  e"1  (ei}  -  ki  , 



9-   Vigenere  with  Progressing  Key*  • 

The  period  of  >>  Vigenere  ean  be  expanded  by  ndding  n 
fixed  number  t  to  the  key  pt  e^.ch  pppefrance — thus  the  n^h  group 
is  enciphered  by  the  equ-.tion 

ei  *  mi  +  ki  +  nt 

Also  this  can  be  vnried  by  adding  t  and  s  alternately  to  the 
key,  etc. 

10.  Matrix  System** 


One  method  of  n  gram  substitution  is  to  operate  on 
successive  n-grams  with  a  matrix  having  an  inverse*    The  letters 
are  assumed  numbered^  from  0  to  85,  making,  them  elements  of  an 
algebraic  ring.    From  the  n-gram  m,  ou  r»*  m   of  message,  the 
matrix  a^j  gives  an  n-gram  of  cryptogram        <  . 

'  n 

e,  •  Z    au  a,  i  »  1,  *t»,n 

1     j=l    1J  J 

The  matrix         is  the  key,  and  deciphering  is  performed  with 

the  inverse  matrix.    The  inverse  matrix  will  exist  if  and  only 
if  the  determinant  la^.  |  has  an  inverse  element  in  the  ring. 

11.  The  Playfair  Cipher. 

This  is  a  particular  typp  of  digram  substitution 
governed  by  a  mixed  25  letter  alphabet  written  in  a  5  x  5 
square.     (The  letter  J  is  often  dropped  in  cryptogrephic  work- 
it  is  very  infrequent,  and  when  it  occurs  can  be  replaced  by  I.) 
Suppose  the  iey  square  is  as  shown  below 


A  0  N  0  U 

RDMIf  '? 

K  Y.S  T  S  ' 

X  B  T  E  W   -  "•'  —  -  ■ 

*  -  ' 

*See  L.  S»  Hill,  "Cryptography  in  an  Algebreic  Alphabet,1* 
American  Math.  Monthly,  v.  36,  No,.  6t  1,  1929,  pp. 306-312,* 
Also  "Concerning  Certain  Linear  Transformation  Apparatus  of  ^ 
Cryptography,"  v*  38,  No.  3,  1931,  pp. 135-154,. 

-  3-i  - 

The  substitute  for  a  digram  AC,  for  example,  is  the  pair  c 
letters  at  the  other  corners  of  the  rectangle  defined  by  A 
and  C,  i.e.  LO,  the  L  taken  first  since  it  is  above  A.  II 
digram  letters  nre  on  c.  horizontal  line  as  RI,  one  uses  th 
letters  to  their  right  DF;  RF  becomes  DR.  If  the  letters 
on  a  vertical  line,  the  letters  below  then  are  used.  Thus 
becomes  UW.  If  the  letters  are  the  same  nulls  nay  be  used 
separate  them  or  one  may  be  omitted,  etc. 

12.    Multiple  Mixed  Alphabet  Substitution. 

In  this  cipher  there  are  a  set  of  d  simple  subst 
tions  which  are  used  in  sequence.    If  the  period  d  is  four 

ml  <m2  *i  ffl4  m5  a6  ,,f 

.  ■•  ' 


h[ml]  f2{m2}  f3(cl3)  f4(m4)  *11b5*  f2(m6} 


13.    Autokey  Cipher. 

A  Vigenere  type  system  in  vihich  either  the  messr 
itself  or  the  resulting  cryptogram  is  used  for  the  "key"  i 
crlled  an  eutokey  cipher.  The  encipherment  is  started  wit 
a  "priming  key"  (which  is  the  entire  key  in  our  sense)  and 
continued  with  the  message  or  cryptogram  displaced  by  the 
length  of  the  prir4ng  key  as  indicated  below  with  the  prin 
key  COMET,    The  message  used  as  "key", 

MESSAGE  .   S  E  N  D  S  U  P      L  I  E  S  ... 

KEY    --  — -  COME  3.8  RiJD  S  UP 


The  Cryptogram  us"ed  as  "key"*  '  ; 

MESSAGE  SENDS  UP'P  LI  E  S  ♦*"#."' 

KEY  .  '  t  O  M  E  t  U  S  2  B  t  0  H  »». 

CRYPTOGRAM    u      U3ZHL0  H*e"S  TS 

-  38  - 

14.    Fractional  Ciphers* 

In  these,  each  letter  is  first  enciphered  into  two 
or  more  letters  or  numbers  and  these  symbols  are  somehow  mixed 
(e.g.  by  transposition).    The  result  may  then  be  retranslated 
into  the  original  alphabet.    Thus  using  a  mixed  25  letter 
alphabet  for  the  key  we  may  translate  letters  into  two  digit 
quinary  numbers  by  the  table 

0  12  3  4 
.     .  0  L  Z  Q,  C  P 

1  AG  NO  V 

2  R  D  M  I  F 

3  K  Y  H  V  S 

4  X  B  TEW  , 


Thus  B  becomes  41.    After  the  resulting  series  of  numbers  is 
transposed  in  some  way  they  are  taken  in  pairs  and  translated 
back  into  letters. 

15#  Codes. 

In' codes  words  (or  sometimes  syllables)  are  replaced 
by  substitute  letter  groups.  Sometimes  a  cipher  of  one  kind  or 
another  is  applied  to  the  result. 


12 ^    Valuations  of  Secrecy  Systems 

There  are  a  number  of  different  criteria  that  should 
be  applied  in  estimating  the  value  of  a  proposed  secrecy  system 
The  more  important  of  these  are:  ' 

1.    Amount  of  Secrecy.  ' 

There  are  some  systems  that  are -perfect — the  'enemy 
ls-no  better  off  after  intercepting  any  amount  of  material  than 
before*  •  Other  systems,  although  giving  him  some  information, 
do  not  yield  a  unique  "solution"  to  intercepted  oryptograms*  , - 
Among  the  uniquely  solvable  systems,  there  are  wide  variations 
in  toe  amount  of  labor  required  to  effect  this  solution;  end  * 
the  amount , of  material  that  must,  be  intercepted  to.  make  the 
solution  unique,  - 

-  39-  -  mJH*H^B£RTE$L 

2.  Size  of  Key.. 

The  key  must  be  transmitted  by  non-interceptible 
means  from  transmitting  to  receiving  ends.    Sometimes  it  must 
be  memorized.    It  is  desirable  then  to  have  the  key  as  small 
as  possible. 

3.  Complexity  of  Enciphering,  and  Deciphering  Operations. 

These  should,  of  course,  be  as  simple  as  possible. 
If  they  are  done  manually,  complexity  lends  to  loss  of  time, 
errors,  etc.  -  If  done  mechanically,,  complexity,  leads  to  large 
expensive  machines.  "    "  v 

4.  ;  Propagation  of  Errors. 

In  certain  types  of  secrecy  systems  an  error  of  one 
letter  in  enciphering  or  transmission  leads  to  a  large  amount 
of  error  , In  the  deciphered  text*    The  errors  are  spread  out  by 
the  deciphering  operation,  c fusing  the  loss  of  much  information 
and  frequent  need  for  repetition  of  the  cryptogram.    It  is 
naturally  desirable  to  minimize  this  error  expansion.. 

5.    Expansion  of  Message.. 

In  some  types  of  secrecy  systems  the  size  of  the 
message  is  increased  by  the  enciphering  process.    This  undesir- 
able effect  may  be  seen  in  systems  where  one  attempts  to  swamp 
out  message  statistics  by  the  eddition  of  many  nulls,  or  where 
multiple  substitutes  are  used.    It  also  occurs  in  many  "conceal- 
ment" types  of  systems  (which  are  not  usually  secrecy  systems 
in  the  sense  of  our  definition). 

15.    Equivalence  Clesses  In  the  Key  Space 

It  may  happen  that  in  a  ciphering  system  two  or  nnre 
different  keys,  say  keys  1,.  2,  and  7,  are  equivalent.  -By  this 
we  meen  that  for  every  M  ~  J 

■>  ■C^m"-i  -  .  ■  - ,      .  • 

,  '   ••'         •.  ;   -  >      ■  —  V  ' 

■  .  ,  '  '  '    .       ,    "  .  ■  Av  .  ■    ^   '  "■ 

These  keys  will  not  be  considered  as  distinct  but  will  be  thrown 
into  an  equivalence  class*.    It  is  >clear  that  the  cryptanalyst 
oan  never  determine  whioh  particular  one  of  these  was  used  but  " 
only  {at  test)  the  class..   The  probability  for  the  class  is  of 
course  the  sam  of  the  probabilities  of  the  different  keys  in    ' : 
the  class.- 

As  an  exemple,  in- the  Playfair  cipher  with  the  s; 
given  above,  the  following  are  equivalent  key  squares. 

GHXPY  X  C  I  2  T 

Z  F  E  C.I  JB'Dl.O 

LONRD  V  S  <}  T  A 

T  A  V  S  Q  t   W  B  MK  U 

K  U  W  B  M  IP  Y  GH 

We  can  think  of  the  possible  equivalence  classes  in  this  c 
as  arrangements  of  a  25  letter  alphabet  on  a  5  x  5  square 
on  an  oriented  torus.    The  number  of  different  .keys  is  not 
but  251/52  -  241 

•  . 

"  When  vie  say  that  two  seorecy  systems  are  the  sam 
mean  that  they  consist  of  the  same  set  of  transformations 
with  the  same  message  and  cryptogram  space  (range  and  dome 
and  the  same  probabilities  for  the  different  keys  (after  e 
identical  transformations  are  put  in  .the  same  equivalence 

14.    The  Algebra  of  Secrecy  Systems 

If  we  have  two  secrecy  systems  T  and  R  we  cen  of 
combine  them  in  various  ways  to  form  a  new  secrecy  system 
If  T  end  R  heve  the  same  domain  (message  space)  we  may  for 
kind  of  "weighted  sum," 

S  ■  p  *T  ♦  q 

where  p  *  q  -  1.    This  operation  consists  of  first  making 
preliminary  choice  with  probabilities  p  and  q  determining 
whioh  of  T  end  R  is  used.    This  cholse  is  part  of  the  key 
After  this  is  determined  T  or  R  is  used  ns  originally  defi 
The  total  key  of  S  must  specify  which  of  T  and  R  is  used  e 
which  key  of  T. (or  R)  is  used*  v 

■  , 
If  T  consists  of  the  transformations  T^.t  1 
with  probabilities  pv,  Pm  end  R  consists  o=f     R,f  ... 

Rv  with  probabilities  q,„  qk  then  S  «  p  T  *  q  R  cons 

of  the  transformations      Tp,  T^  "•— ,  T  ,  Rr,  Rfc  wit^ 

probabilities  pp,.,  ppg,  •       PPa,  qqx»  Sfagi        •  qqk 

-  41  - 

More  generally  we  c^n  form  the  sum  of  a  number 


S  =  P1T+p2R+...  +  pmU     Sp1  -  1 

We  note  that  any  system  T  can  be  written  as  a  sum  of  fixed 

T  "  pl  Tl  +  p2  TS  + +  pm  Tm 

Tj  being  a  definite  enciphering  operation  of  T  correspond!: 
key  choice  i,  which  has  probability  pf« 

A  second  way  of  combining  two  secrecy  systems  is 
taking  the  "product",  shown  schematically  in  Fig.  8.  Suppr 
T  and  R  are  two  systems  and  the  domain  (language  space)  of 
can  be  identified  with  the  range  (cryptogram  space)  of  R. 
we  can  apply  first  R  to  our  language  and  then  T  to  the  resi 
of  this  enciphering  process.    This  gives  a  resultant  operat 
which  we  write  as  a  product  ' 

S  -  T  R 

The  key  for  S  consists  of  both  keys  of  T  and  R  which  are  as 
ohosen  aocording  to  their  original  probabilities  and  indepe 
ly.    Thus  if  the  m  keys  of  T  are  chosen  with  probabilities 

pl  p2  pm 
and  the  n  keys  of  K  have  probabilities 

pl  p2  pn 

then  S  has  mn  keys  (at  most;  there  may  and  often  will  be 
equivalence  classes)  with  probabilities-  p.  pl.    This  type  c 
product  encipherment  is  often  used;  for         J    example  one 
follows  a  substitution  by  a  transposition  or  a  transpositic 
by  a  Vigen£re,  or  applies  a  code  to  the  text  and  enoiphers 
jte*,  result  by  substitution,  transposition,  fractionation,  etc» 

k\  -  A  more  special  type  of  product  may  be  defined  in 

case  both  T  and  R  have  keys  of  the  3cme  size  which  may  be  f 

rw  in  one-to-one  correspondence  with  the  same  probabilities  fc 

corresponding  keys.    This  may  be  called  the  "inner  product, 
in  oontrast  with  the  above  which  may  be  more  completely  de- 
scribed as  an  "outer  product"  (these  names  are  derived  froir. 
a  rough  analogy  with  the  concepts  of  tensor  analysis).  In 
the  inner  product,  written 

'\  S  m  T  °R 


-  42  -  Q&ffSBEMTtcT 

r.nd  indicated  scheme tically  in  Fig.  9,  the  same  key  (or  corr- 
spending  keys)  are  used  for  both  T  end  R  chosen  with  the  com 

For  exr-nple  one  nay  construct  e  transposition  cip: 
whose  key  is  a  permutation  of  the  alphabet,  each  permutation 
being  equally  likely,  and  apply  first  this  and  then  a  substi" 
tion  based  on  the  same  permutation.  One  also  sees  this  situ: 
tion  in  certain  geometrical  types  of  transposition  ciphers 
where  the  text  is  written  into  a  square  and  a  permutation  ba. 
on  a  key  word  applied  first  to  the  columns  and  then  the  r 
of  the  square, 

*  It  may  be  noted  that  multiplication  (either  kind) 

not  in  general  commutative,  (we  do  not  always  have  BS"SB 
although  In  special  cases  such  as  substitution  and  transposi* 
it  is.    Since  it  represents  an  operation  it  is  def initionall; 
associative.    That  is  R(ST)  -  (RS)  T  *  RST,.   Furthermore  we  ! 
the  laws  \  '        '   ,  ' 

p  (p»  T+  q'  R)  +  qS  *  p  p'  T  +  p  qT  R  +  q  S 
(weighted  associative  law  for  addition) 

(right  and  left  hand  distributive  laws) 


Pl  T  +  p2  T  +  ?3  R  -  (px  +  P2)  T  +  P3  R 

Finally  with  regard  to  this  algebraic  structure  of 
secrecy  operations,  we  note  that  every  closed  secrecy  system 
has  an  "inverse"  T1  obtained  by  Interchanging  the  E  end  M 
spaces,  with  key  probabilities  the  s*me,  and 

\T  R  S)»  -  S*  R»  T* 

(p  T  +  q  R)*  -  P  V  ♦  q  K*%  -  , 

'  ...<_ 

Note  that  T  T'  is  not  in  generel  the  -identity  (this  is  the 
reason  we  do  not  write  T**+)»  .  -< 

■■■  y.t:  I      .  .  -    .  .  - 

A  system  whose  M  and  E  spaces  can  be  identified, 
a  very  common  oase  as  when  letter  sequences  are  transformed 
into  letter  sequences,  may  be  termed  endomorphic*    An  endo- 
morphic  system  T  may  be  raised  to  a  power  Tn» 

-  43  - 

A  secrecy  system  T  whose  outer  product  with  itsel: 
is  equal  to  T,  i.e.  for  which 

T  T  ■  T 

will  be  called  idempotent.  For  example  simple  substitution 
transposition  of  period  p,  Vigenere  of  period  p  (all  with  e 
key  equally  likely)  are  idempotent. 

The  set  of  all  endomorphic  secrecy  systems  deflnec 
a  fixed  message  space  constitute  an  "algebraic  vrriety,"  th 
is,  a  kind  of  algebra,  using  the  operations  of  addition  and 
multiplication.    In  fact,  the  properties  of  addition  and  mu 
plication  which  we  have  discussed  lead  to  the  following  res 

Theorem  1:    The  set  of  endomorphic  oiphers  with  the  same 

message  space  and  the  two  combining  operations 
of  weighted  addition  and  ouster  multiplication 
from  a  linear  associative  algebra  with- a  unit 
element,  apart  from  the  fact  that  the 
coefficients  in  a  weighted  addition  must  be 
non-negative  and  sum  to"  unity* 

It  should  be  emphasized  that  these  combining  oper 
tions  of  addition  and  multiplication  apply  to  secrecy  syste: 
as  a  whole.    The  product  of  two  systems  TR  should  not  be  co 
fused  with  the  product  of  the  transformations  in  the  system 
TjR,,  which  also  appears  often  in  this  work.    The  former  T 
is  a**  secrecy  system,  i.e.  a  set  of  transformations  with  as- 
sociated probabilities;  the  latter  is  a  particular  trans- 
formation. •  Further  the  sum  of  two  systems  p  R  +  q  T  is  a 
system — the  sum  of  two  transformations  is  not  defined.  The 
systems  T  and  R  may  commute  without  the  individual  T,  and  R, 
commuting,  e.g.  if  R  is  a  Beaufort  system  of  a  given  perio 
all  keys  equally  likely, 

Ri  R 3  *  RJ  Ri' 

in  general,  but  of  course  RR  does  not  depend  on  its  order; 
actually  ^       •  - 

' -RR >  v  -vv-r         '  ■■  • 

the  Vigenere  of,  the  same  period  with  random  key*    On  the  oti 
hand,  if  the  individual  T.  and  E,  of  two  systems  T  and  R 
commute,  then  the  systems  commute**  "  \~    \  - 

.  i..  ..  •  >  ■ .    .  •  ••  - 

It  is  rather  surprising  to  find  an  algebraic  varir 
with  as  much  structure  as  a  linear  associative  algebra  in  w> 


-  44  - 

•the  elements  have  the  complexity  of  ciphers.     In  Hilbert  space 
theory,  for  example,  one  has  a  linear  associative  algebra, 
but  the  elements  of  the  algebra  are  transformations.    Here  the 
elements  are  sets  of  transformations  with  a  probability  space 
associated  ■  ith  the  transformation  parameter. 

These  combining  operations  give  us  ways  of  con- 
structing many  new  types  of  secrecy  systems  from  certain  ones, 
such  as  the  examples  given.    We  may  also  use  them  to  describe 
the  situation  facing  a  cryptanalyst  when  •attempting  to  solve  a 
oryptogram  of  unknown  type.    He  is,  in  fact,  solving  a  secrecy 
system  of. the  type 

T      Px  A  +  pg  B  * . .  .  .  +  Pr  S  +  p*  X  Z  p  m  1 

where  the  &f.B»>*t*i  s  are  known  types  of  ciphers,  with  the  p« 
their  a  priori  probabilities  in  this  situation,  and. pf  X 
corresponds  to  the  possibility  of  a  completely  new  unknown  type 
of  cipher* 

'    In  weighted  r.ddition  the  key  size  of  the  result  is 

given  by 

=  p  IK.J  +  q  |K2I  -  (p  log  p  +  q  log  q) 

=  p  Ik-J  +  q  Ik2|  ♦  |k3I 

i.e.  the  weighted  mean  of  the  two  keys  plus  the  size  of  the 
.  p,  q  key*    This  is  only  in  case  there  are  no  equivalences; 
if  there  are  it  will  always  be  less. 

For  the    outer  product  the  key  size  is 

Ik II  1^ I  ♦  |k2I 


with  -equality  only  when  there  are  no  equivalences.    In  the 
inner  product 

Ik! <  |kx!  -  Ik2I 

with  equality  under  the  same  condition. 

45  - 

15.    Pure  and  Mixed  Ciphers 

Certain  types  of  ciphers,  such  as  the  simple  sub 
stitution,  the  transposition  of  a  given  period,  the  Vigene 
of  o  given  period,  the  mixed  alphabet  Vigenere,  etc  (all 
with  each  key  equally  likely)  have  a  certain  homogeniety  v, 
respect  to  key*  Whatever  the  key,  the  enciphering,  deciph 
ing  and  decrypting  processes  are  essentially  the  same.  Thi 
may  be  contrasted  with  the  cipher 


where  S  is  a  simple'  substitution  and  T  a  transposition  of 
given  period.    In  this  case  the  entire  system  changes  for 
enciphering,  deciphering  and  decryptment,  depending  on  whe 
the  substitution  or  transposition  was  used* 

The  cause  of  the  homogeniety  %a  certain  ciphers 
stems  from  the  ^roup  property — we.  not! oe ' that  in  the  above 
amples  of  homogeneous  ciphers  the  product  of  any  two  trans 
formations  in  the  set  T,  T,  is  equal  to  a  third  transforme 
T,.  in  the  set,  while  T1^1J  does  not  equal  any  transformat 
iB  the  cipher  f 

p  S  +  q  T 

which  contains  only  substitutions  and  transpositions,  no 

We  might  define  a  "pure"  oipher,  then,  as  one  wfc 
T*  formed  a  group.  This,  however,  would  be  too  restricti-v 
since  it  requires  that  the  E  space  be  the  same  as  the  M  si 
i.e.  that  the  system  be  end amorphic.  The  fractional  trans 
position  is  as  homogeneous  as  the  ordinary  transposition  v- 
out  being  endomorphic.  The  proper  definition  is  the  folic 
A  cipher  T  is  pure  if  for  every  Tj,  Ty  Tk  there  is  a  Tg  s 

Ti  V1  Tk  -  V  . 

and  every  key  is  equally  .likely.  '  Otherwise  the  cipher  Is 
The  systems  of  Fig.  7  are  mixed.    Fig-  10  is  pure  if  all  k 
are  equally  likely. 

r     «♦'•  -    r---  .  „i 

Theorem  2:    In  a  pure  cipher  the  operations  T.      T,  which 
transform  the  message  space  into  itselT  form 
group  whose  order  is  m,  the  number  of  differen 


Y1  \  V1  tj  " 1 

so  that  e*iCh  element  has  «n  inverse,  also  the  assoeiativ 
law  is  true  since  these  are  operations,  end  the  group 
property  follows  from 

using  our  assumption  that  T,-1  T,'  -  T .    •  T-  for  some  s. 

The  operation  T^-^T^  means,  of  course,  enciph 

the  message  with  key  j  and  then 'deciphering  with  key  i  w 
brings  us  back  to  the  message- spa'oe*  ,  If  T  is  endomorphi- 
i.e.  the  T,  themselves  transform  the  space  0M  into  itsel: 
is  the  case  with  most  ciphers,  where  both  the  message  sp 
and  the  cryptogram  space-  consist  of  sequehoes  of  letters 
and  the  T^'  are  a  group  and  equally  likely,  then  T  is  purt 


Ti  Y    Tk  •  Ti  Tr  "  Ts  • 

Theorem  3:    The  outer  product  of  two  pure  c,iphers  which  c 
mute  is  pure. 

For  if  T  end  R  commute  ^  R^  -  R^  Tm  for  every  i,  j  with 
suitable  £,  m,  and 

.  .  ■  .  - 

The  commutation  condition  is  not  necessery,  however,  for 
product  to  be  a  pure  cipher*  ' 

A  system  with  only  one  key*  a  single  defini 

operation  T^,  is  pure,  since  the  only 'choice  of  Indices  is 

Tl  Tl"1  Tl  *  Tl* 

Thus  the  expansion  of  a  general  cipher  into  a  sum  of  such 
simple  transformations  also  '.exhibits  it  as  ft  sum  of  pure 

An  examination  of  the  example  of  a  pure  cipher 
shown  in  Fig.  5  discloses  certain  properties.    The  message 
fall  into  certein  subsets  which  we  will  cell  residue  clas; 
and  the  possible  cryptograms  are  divided  into  correspond!: 
residue  classes.    There  is  at  least  one  line  from  mes 
sage  in  a  class  to  each  cryptogram  in  the  corresponding  cl 
and  no  line  between  classes  which  do  not  correspond.  The 
number  of  messages  in  a  class  is  a  divisor  of  the  total 
number  of  keys.    The  number  of  lines  "in  parallel"  from  a 
message  M  to  a  cryptogram  in  the  corresponding  class  is  ec 
to  the  number  of  keys  divided  by  the  number  of  messages  ir 
the  class  containing  the  message  (or  cryptogram)*    It  is  s 
in  the  appendix  th?t  these  hold  in  generel  for  pure  cipher 
Summarized  in  a  more  formal  statement  we  neve  / 

Theorem  4:     In  a  pure  system  the  messages  can  be  divided  i 
a.  set  of  "residue  classes"  C.,  C2,  C„  and 

the  cryptograms  into  a  corresponding  set  of 
residue  classes  C'     C'     . ..,  C'  with  the  folic 

The  message  residue  classes  are  mutually 
exclusive  end  collectively  contain  all 
possible  messages..    Similarly  for  the  residue  classes. 

Enciphering  *ny  message  in  C,  with  any  ke 
produces  a  cryptogram  in  CI.  Decipherir. 
any  cryptogram  in  C!  with  any  key  leads 
to  a  message  in  C^t 

The  number  of  messages  in  C. ,  say  <p.  ,  is 
equal  to  the  number  of  cryptograms 
in  C£  and  is  a 'divisor  of  k  the  number 

of  keys. 

Each  mrssnge  in       can  be  enciphered  into 
erch  cryptogram  in  Ci  by  exactly.  JL 
different  keys.    Conversely  qp.  . 

for  decipherment.  4 



-  48 

The  importance  of  the  concept  of  a  pure  cipher 
the  reason  for  the  nane)  lies  in  the  fact  that  for  them  & 
keys  are  essentially  the  same.    Whatever  key  is  used  for 
&  particulsr  message,  the  a  posteriori  probabilities  of  a 
messages  are  identical*    To  see  this,  note  that  two  diffe 
keys  applied  to  the  same  message  lead  to  two  cryp-tcgrams 
the  same  residue  class,  say  Cj  »    The  two  cryptograms  ther 
fore  could  each  be  deciphered    by  — keys  into  each  mes.< 


in  C.  and  into  no  other  possible  messages.    All  keys  be in, 
equally  likely  the  a  posteriori  probabilities  of  various 
messages  are  thus 

pbim)  -  hp  a&ai  _mi 

E  P{M)  PM{E)  " 

where  M  is  in  C,,  E  is  in  CI  and  the  sum  is  over  all  mess- 
in  C, ..  If  E  and  M  are  not  In  corresponding  residue  classe 
Pg(Mr  -  0/    Similarly  it  can  be  shown  that  the  a  posterio: 

probabilities  of  the  different  keys  are  the  same  in  value 
these  values  ere  associated  with  different  keys  when  a  di? 
ent  key  is  used.    The  same  set  of  values  of  PE(K)  have  un< 
gone  a  permute t ion  among  the  keys.    Thus  we  haVe  the  resul 

.  Theorem  5:  In  a  pure  system  the  a  posteriori  probability 
of  various  messeges  P~(MJ  are  independent  of  t 
key  that  is  chosen*  The  a  posteriori  prob; 
bilities  of  the  keys  PE(K)  are  the  same  in  vai 
but  undergo  a  permutation  with  a  different  ke\ 

Roughly  we  may  say  that  any  key  choice  leads  tc 
the  cryptanalytic  problem  in  a  pure  cipher.  Since  tfc 
different  keys  all  result  in  cryptograms  in  the  same  resid 
class  this  means  that  all  cryptograms  in  the  same  residue 
class  nre  cryptanalytically  equivalent — they  lead  to  the  s 
a  posteriori  probabilities  of  messages  and,  epart  from  a 
permutr.tion,  the  same  probabilities  of  keys. 

As  an  example  of  this,  simple  substitution  wit: 
all  keys  equally  likely  is  e  pure  cipher-    The  residue  cle 
corresponding  to  a  giTen  cryptogram  E  is  the  set  of  all 
Cryptograms  that  may  be  obtained  from  E  by  ope'rstions  T  <  T 
In  this  case  T .  Tk~l  is  itself'  a  substitution  and  henoe  an. 
substitution  oil  E  gives  another  member  of  the  same  residue 
class..    Thus  if  the  cryptogram  is 


' |'||  |  I  ■  





etc.  ore  in  the  same  residue  class.    It  is  obvious  in  this 
case,  that  these  cryptograms  are  essentially  equivalent. 
AIT  that  is  of  importance  in  a  simple  substitution  with 
random  key  is  the  pattern  of  letter  repetitions,  the  actur 
letters  being  dummy  variables  *  ,  Indeed  vie  might  dispense 
with  them  entirely  indicating  the  pattern  of  repetitions 
in  E  as  follows:*  - 

This  notation  describes  the  residue  class  but  eliminates  e 
information  as  to  the  specific  member  of  the  class*  Thus 
leaves  precisely  that  information  which  is  cryptanalytical 
pertinent.    This  is  related  to  one  method  of  attacking  sic 
substitution  ciphers — the  method  of  pattern  words. 

In  the  Caesar  type  cipher  only  the  first  difft 
ences  mod  26  of  the  cryptogram  are  significant.  Two  crypt 
grams  with  the  sane  Ae,  are  in  the  same  residue  class.  Or. 
breaks  this  cipher  by  the  simple  process  of  writing  down  t 
26  members  of  the  message  residue  class  and  picking  out  th 
one  which  makes  sense. 

The  Vigenere  of  period  d  with  rpndom  key  is  a'r. 
example  of  a  pure  cipher.    Here  the  message  residue  class 
consists  of  all  sequences  with  the  same  first  differences 
letters  separated  by  distance  d  as  the  cryptogram.  For 
d  m  3  the  residue  class  is  defined  by 

ml  "  m4  "  el  ~  e4 
m2      m5  "  e2  "  e5 

~  n6       e5  "  66  r 
m4  '  "7  "  64  "e7( 


^Suggested  by  a  notation  used  by  Quine  in  Symbolic  Logic* 

-  50  - 

where  E  -  e^,  e0,   ...  is  the  cryptogram  and  m^,  m^,  ...  is  any 
M  in  the  corresponding  residue  class. 

In  the  transposition  cipher  of  period  d  with  random 
key,  the  residue  class  consists  of  all  arrangements  of  the  e. 
in  which  no  e,  is  moved  out  of  its  block  of  length  d,  and  any 
two  e.  at  a        distance  d  remain  at  this  distance.     This  is  used 
in  brisking  these  ciphers  as  follows.    The  cryptogram  is  written 
in  successive  blocks  of  length  d,  one  under  another  as  belo-w 
(d  «=  5): 




















The  columns  are  then  cut  apart  and ^rearranged  to  make  sense. 
When  the  columns  are  cut  apart,  the  only  information  remaining 
is  the  residue  class  of  the  cryptogram. 

Theorem  6:     If  T  is  pure  then  Tj_  T*      T  «  T  where  ' 
Ti  Tj  are  eny  tv,°  tronsform'' 'tions  of  T.  J  Conversely  if 

this  is  true  for  any       Tj  in  a  system  T  then  T  is  pure. 

The  first  part  of  this  theorem  is  obvious  from  the 

definition  of  a  pure  system.     To  prove  the  second  part  we  note 

first  that  if  T,  T."1  T  *  T  then  T,  T.-l  T    is  a  transforma- 
l     j  1     j  s 

tion  of  T.     It  remains  to  show  thpt  all  keys  are  equiprob^ble . 

We  have  T  -  E   P    T  and 

s    *s     i     j        s      s   *s  s 

the  term  in  the   left  hand  sum  with  s  •  j  yields 
The  only  term  in  Tj  on  the  right  is  Since  all  co- 

efficients rrc  non  negative  it  follows  that 


The  same  argument  holds  with  i  and  $  interchanged  and 

pj  c  Pl 

and  T  is  pure.    Thus  the  condition  th^t  T,  T.-1  T  -  T  might 
be  used  ~s  an  -  lti.rn- tive  definition  of  a      J  pure  system. 

-  51  - 

The  property  of  purity  in  e  system  is  connected  vtit.v 
idempotence.     Thus  consider  the  system  S  ■  T  T'  where  T  is 
pure.    We  have 

Ti  Tj"1  Ts  V1  '  Ti  V1  Tr  V1  "  Ti  V1 

so  th"t  the  transformations  of  S  are  the  same  ~s  those  of  S, 
■and  since  both  S  and  S    are  pure  we  hrve 

S  -  S2 

Theorem  7:    If  T  is  pure  S  »  T  I'  is  pure  and  S2  *  S. 

An  endomorphic  system  T  which  satisfies  the  conditi' 
Ti  Tj  *  Ts  ^but  not  necessrrily  with  all  key  probabilities 
equal)  can  be  shown  to  approach  a  pure  cipher  on  raising  to  a 
high  power,  namely  the  one  with  the  same  trensf ormr-tions ,  but 
with  all  probabilities  equalized..    In  fact  the  probabilities 
for  Tn+1  are  derived  from  those  for  T^  by  a  Markoff  process, 
of  a  special  type  due  to  the.  group  property*    This  special 
type  always  approaches  the  limit  of  equalized  probabilities. 
This  seme  argument  applies  more  generally.'   We  have 

Theorem  8:    Let  T  be  any  endomorphic  cipher.     If  T11  approaches 
any  limit  at  ^11,  which  will  necessarily  occur  if 
all  the  transformations  of  Tn  lie  in  a  finite  set 
(no  matter  how  large  n)  and  the  transf arffln tions  of 
T  include  the  identity  then  this  limit  will  be  r 
pure  cipher. 

As  m  example  consider  the  cipher 

R  =  p  T  +  q  S 

where  T  is  transposition  with  random  key  and  S  substitution 
with  random  key.    We  have 

S2  =  S 


ST    ■  T  S 


and  hence  any  product  of  T*  s  and  S?s  suoh  asTST-TTSS 
reduces  to  S  T.  Thus 

Rn  -  pn  T  +  qn  S  +  (1  -  pn  1  qD)  S  T 

-  52  - 

Ls  n      10  the  first  two  terms  approach  zero  find 

Lin    Rn  »  S  T 

n  -*•  xi 

The  concepts  of  pure  ^nd  mixed  lnngu-.gts  nnd.  pu 
and  mixed  ciphers  have  an  application  in  practical  cryptana 
ysis,  if  we  interpret  them  somewhat  loosely.  When  a  crypt-1 
grapher  starts  work  on  a  cryptogram,  his  first  job  is  to  de 
termine  the  original  language.  Approximately  then  he  is  de 
termining  the  pure  component  of  the  general  language  space 

L  >  px  Lx  +  p2  Lz  +  ...  ♦  pn  Ln 

where        say  is  English,  L£  German,  etc.    Of  course  these  e 

not  pure  but  the  different  components  of  them  are  fairly  cl 
together  in  statistical  structure. 

The  second  thing  a  cryptographer  d~>es  is  to  de 
termine  the  "type"  of  cipher  that  was  used — usually  this  is 
about  the  same  as  finding  the  pure  component  in  the  general 
cipher  system 

R  •  Px  S  +  p2  T  +  p3  Y  +  ... 

where  3  say  is  simple  substitution,  T  is  transposition,  etc 
A  Vigenere  V  of  unknown  period  is  not  a  pure  cipher  but  the 

V  *  Pi  Vl  +  P2V2  +  *3  V3  +  — 

where  V,  is  of  period  i,  is  into  puro  components  (if  all  ke 
are  equally  likely  for  any  period).     In  solving  e  Vigenere 
the  first  problem  is  to  determine  the  period.    The  same  is 
true  in  transposition. 

The  reason  for  this  initial  isolation  of  pure 
«of  neerly  pure  language  and  cipher  is  that  only  then  or.n  a 
simple  meaningful  stntistical  analysis  be  carried  out. 


16.    Involutory  Systems 

If  every  trsnsf orrar: tioh  in  n  systen  T  is  its  y. 
inverse,  i.e.  If 

Ti  Ti  -  1 

for  every  i,  the  system  will  be  called  involutory.  Such 
systems  are  important  prrcticrlly  since  the  enciphering  r 
deciphering  operations  -re  then  identical.  This  l«vds  t* 
sinplifiod  instructions  to  cryptographic  clerks  in  manual 
oper^ti^n,  or  in  mechanical  cases  the  sane  machine  with  t 
sane  key  setting  nay  be  usee"  for  bath  ~perctions. 

Examples:     In  simple  substitution  we  nay  limit  our  trans- 
formations to  those  in  which  when  letter  9  is 
the  substitute  for  <p,  9  is  the  substitute  for 
.toother  example  is  the  Beaufort  cipher- 

If  T  is  involutory,  so  is  the  system  whose  ope 
tions  are  :^-.;>r : 

■  -  .  *  ' .     •"  ■  .*• 1 

SS  Ti  si 

\  -  ,* 

since  ■  ; . 

17.    Similar  rnf  Weekly  Similar  Systems 

Two  secrecy  systems  R  and  S  will  be  s-^id  to  b< 
similar  if  there  exists  '  transf  orn- tion  /.  having  en.  invc 
A- J-  such  th^t 


R  ■  A  S 

This  means  thrt  enciphering  with  R  is  the  same  ps  enciphe 
with  S  '  n.Q  then  0  per-  ting  on  the  result  with  the  transf  or 
tion  A.  If  wo  write  Rw  S  to  mean  R  is  similar  to  S  then 
is  clear  thrt  R»S  implies  S^R,  Also  R«  S  pnd  S»  T  impl 
R~T  and  finally  R~R.  These  are  sun-prized  in  mathenati 
terminology  by  spying  that  similarity  is  an  equivalence 
relation.  *  *  '/  * 

The  cryptographic  significance  of  similarity  i. 
if  R~S  then  R  and  S  are  equivalent  from  the  cryptanaly 
point  of  view.  Indeed  if  a  cryptanalyst  intercepts  a  cry 
gram  in  systemNS  he  can  transform  it  to  one  in  system  R  b; 
merely  applying  the  transformation  A  to  it#  /.  cryptogram 
system  R  is  transformed  to  one  in  S  by  applying vArlf  If  : 
and  S  ar6  applied  to  the  same  language  or  message  space, 
there  is  f  one-to-one  correspondence  between  the  rc-sultin 
cryptograms.  Corresponding. cryptograms  give  the  same  dis 
tribution  of  r  posteriori  probabilities  for  all  messages. 

If  ~ne  hrs  r  art|p3  of  broking  the  system  R  the: 
any  system  S  similar  to  R  en  be  broken  by  reducing  to  R 
through  application  if  the  -perrti^n  A.'    This  is  r  device 
thct  is  frequently  used  in  pr^ctic~l  cryptrn" lysis . 

Examples:     As  r  trivial  cx^mjle,  simple  substitution  v.herc 
the  substitutes  ^re  n^t  letters  but  ^rbitr^ry 
symbols  is  similar  t?  simple  substitution  using 
letter  substitutes.     A  second  exrmple  is  the 
Cresar  rnd  the  reversed  C^es^r  type  ciphers. 
The  letter  is  sometimes  broken  by  first  trans- 
forming into  a  Cresar  type.     The  V-igenere, 
Beaufort  rn?  Variant  Beaufort  are  p11  similar, 
•when  the  key  is  random.    The  "autokey"  cipher 
primed  with  the  key  K,  Kg  ...  K,  is  similar  to  • 
Vigenere  type  with  the  key  .'alternately  added  an' 
subtracted  Lod  86»    The  %tf  nsformrtion  A.  in  this 
case  is  th^t  of  "deciphering"  the.  autokey  with 

.  a  series  of  d  A*s  for  the  priming  key.-.  - 

*  '•-•.'■».    .■■>:.  .v.... 

Tv,-  systems  R  fn?  S  are  weakly  similar  if  there 
exist  two  transformations  A  an<*  B  having  inverse  A'l  end 
B-l  with 

R  -  A  S  B 

This  me^ns  ttrt  system  R  is  the  same  ~s  applying  first  B 
t^  the  language,  then  S,  mc1  finally  A.     This  rcl^tim  is 
rlso  nn  equivalence  relation. 

Finding  a  method  of  solution  f-^r  system  R  with 
lrngunge  L  is  equivalent  t^  finding  a  solution  for  S  with 
language  B  L.  ■ 

We  may  note  that  if  R  is  pure  an'  S  is  weekly 
similar  t'  R  then  S  is  pure.    This  follows  from 

R.i  Rj-1  Rk  -  Rt 

■  A  Si  B 
Kfl  «  B--1  Sj1  A"1 

\  -  A  sk  B  v/ 

where  we  assume  corresponding  transformations  in  R  on"  S 
t-i  h~ve  the  srme  subscripts.  Hence 

-  55  - 



R.  R  -  *  R.    -  A  S,  S.      S.    B  "  R 

i  °j 

.r1  r^  b"1 


anc  S  is  therefore  pure* 

*  -  t 

t  •.  . 


Theoretical  Secrecy 


We  now  consider  problems  connected  with  the  "theorecti- 
cal  secrecy"  of  a  system.    How  immune  is  a  system  to  cryptanaly- 
sis  when  the  eryptanalyst  has  unlimited  time  and  manpower  avail- 
able for  the  analysis  of  cryptograms?    Does  a  cryptogram  have  a 
unique  solution  (even  though  it  may  require  an  impractical  amount 
of  work  to  find  It)  and  if  not  how  many  reasonable  solutions  does 
it  have?    How  much  text  in  a  given  system  must  be  intercepted  be- 
fore the  solution  becomes  unique?    Are  there  systems  which  never 
become-  unique  in  solution  no  matter  how  much-  enciphered  text  is 
Intercepted?    Are  there  systems  for  which  no  Information  whatever 
is  given  to  the  enemy  no  matter  how  much  text  is  intercepted? 

18    Perfect  Secrecy 

Let  us  suppose  the  possible  messages  are  finite  in 
number  Mi..*  Mn  and  have  a  priori  probabilities  P{Mi),..., 

P(Mn),  and  that  these  are  enciphered  into  the  possible  crypto- 
grams Ei  ,..Em  by 

E  -  Ti  M  . 

The  eryptanalyst  intercepts  a  particular  E  and  can 
then  calculate  the  a  posteriori  probabilities  for  the  various 
messages,  Pe(M) •    IT  is  natural  to  define  perfect  secrecy  by 

the  oondition  that  for  all  E,  the  a_  posteriori  probabilities  are 
equal  to  the  a  priori  probabilities  independently  of  the  .values 
of  these,    In~~tnis  case,  intercepting  the  message  has  given  the 
eryptanalyst  no  information**    Any  action  of  his  whioh  depends 
on  the  Information  contained  in  the  cryptogram  cannot  be  altered, 
for  all  of  his  probabilities  as  to  what  the  cryptogram  contains 
remain  unchanged*-  f  On  the  other  hand,  if  the  condition  Is  not 
satisfied  there  will  exist  situations'  in  which  the  enemy  has  cer- 
tain a_  priori  probabilities,  and  certain  key  snd  messages  are 
chosen  where  the  enemy^  probabilities  do  .change*    This  in  turn 
may  effect  his  actions  and  thus  perfect  secrecy -has  not  been  .  .  , 

—  «•.'  *»        ^  «•        «•         —        «►        «•        —        -*        a»  _   ■»         f         •»         —         a»  .     a*  •» 

*A  purist  might  object  that  the  enemy  has  obtained  a  bit  of  infor- 
mation in  that  he  knows  a  messsge  was  sent.    This  may  be  answered 
bykJhaving  among  the  messages  a  "blank"  corresponding  to  "no  mes- 
sage tfl    If  no  message  is  originated  the  blank  is  enciphered  and 
sent  as  a  cryptogram,,    Then  even  this  modicum  of  remaining  infor- 
mation is  eliminated, 

obtained.  Hence  the  definition  given  is  necessarily  required  by 
our  ideas  of  what  perfect  secrecy  should  mean. 

A  necessary  and  sufficient  condition  for  perfect  sec- 
recy can  be  found  as  follows.-    We  have  by  Bayes'  theorem 

t>  P(M)  ^(E) 
P-r  M    -   ■  

*  P(E) 

>  ■ 

and  this  must  equal  P(M)  for  perfect  secrecy,    Hence  either 
P(M)  *  0,  a  solution  that  must  be  excluded  since  we  demand  the 
equality  independent  of  the  values  of  P(M) ,  or   ;  ; 

-  '  )    ;        -,p(e)  .  ■ 

for  every  M  and  E»    Conversely  if  ^(E)  -  P(E)  then 
and  we  have  perfect  secrecy*    Thus  we  have  the  result: 

■  . 

Theorem-  9;    A  necessary  and  sufficient  condition  for 
perfect  secrecy  is  that 


PM(E)  -  P(E) 

for' all  M  and  E.    That  is  Pjj(E)  must  be 
independent  of  K, 

The  probability  of  all  keys  that  transform  M«  into  a  given  crypto- 
gram E  is  equal  to  that  of  all  keys  transforming  if*  into  the 
same  E. 

Now  there  must  be  as  many  E's  as  there  are  MTs,  since 
fixing  i,  Tj  gives  a  one-to-one  correspondence  between  all  the 
MTs  and  some  of  the  E»s  .    For  perfect  secrecy  Pvr(E)  «  P(E)  ^  0 
for  any  of  these  E»s  and  any  M. ■  Hence  there  is  at  least -one  key 
transforming  any  M  into  any  of  these  E*e,    But  all  the  keys  from 
a  fixed  M:to  different  E's  must  be  different,  and  therefore  the' 
number  of  different  keys, is  at  least  as  great  as  the  number  of 
M»s*    It  is' possible  to  obtain' perfect,  secrecy  with  no  more,  »s 
one  shows  by  the  following  example*  .  I,et  the       be  numbered  1  to 
n  and.  the  E^  the  same >  and  using  n  keys  let 
_  - ^  ■*  >:?:**,:■  <■     *  *f 'f'*t'%«..   .:  .  ■     .   •'        •'    rj**?*  '  '  - 

where  s  ■  i  +>j  (Mod  nj  .  •  In  this^case  we  see  that  P~(M)  »  —  »  P<E) 
and  we  have  perfect  secrecy.'  An  example  is  shown 
with  n  «  5. • 

-  58  -  ooaam^mj 

These  perfect  systems  in  which  the  number  of  crypt 
grams,  the  number  of  messages r  and  the  number  of  keys  are  al 
equal  are  characterized  by  the  properties  that  (1)  each  M  is 
connected  to  each  E  by  exactly  one  line,  (2)  all  keys  are  eq 
likely.  Thus  the  three  matrix  representations  of  the  system 
"latin  squares". 

We  have  then  concealed  completely  an  amount  of  inf 
tion  at  most  log  n  with  a  size  of  key  log  n.  This  is  the  fi 
example  of  a  general  principle  which  we  will  often  see,  that 
there  is  a  limit  to  what  can  obtain  with  a  given  key  size— t 
amount  of  uncertainty  we  can  introduce  into  the  solution  of 
cryptogram  cannot  be  greater  than  the  key  size*  Here  we  hav 
concealed  all  the  information  but  the  ke*y  size  is  as  large  a 
message  space*  . 

We  now  consider  the  case  where  lM|  is  infinite;  in 
suppose  the  message  generated  as  an  unending  sequence  of  let 
by  a  Markoff  process*  The  maximum  rate  of  this  source  is  Rc 
It  is  clear  from  our  results  above  that  no  finite  key  will  g 
perfect  secrecy.  We  suppose  then  that  the  key  source  genere 
key  also  in  the  same  manner,  i.e.  as  an  infinite  sequence  or 
bols  with  a  mean  rate  RK.  Suppose  that  only  a  certain  lengt 
key  Ljc  is  needed"  to  encipher  and  decipher  a  length       of  mes 

Theorem  10:    For  perfect  secrecy  (when  the  a  priori  proba- 
bilities of  various  messages  can  be  anything) , 
for  large  L 

Ro  LM  <  % 

and  the  rate  (RR  *  e)  is  asymptotically 

This  may  be  provSd  by  the  same  method  (essentially 
the  finite  case.    This  case  is  realized  by  the  Vernam  systet 

These  results  have  been  deduced  on  the  basis  of  un 
or  arbitrary  a. priori  probabilities  for  the  messages*  The  k 
required  for  perfect  secrecy  depends  then  on  the  total  numbe 
possible  me s sages j  6?  on  the  maximum  rate  Bo  °f  the' message 

source.  *    -  •'. 

"  ~* '  -  one  would  suspect  that  if  the  message  space  has  fi 

known  statistics;  so  that  it  has  a  definite  mean  rate  R  of 
generating  information,  th<3n  the  amount  of  key  needed  could 
reduced  in  an  average  sense  in  just  this  ratio  JL»  end  this 


indeed  true.  In  fact  the  message  can  be  passed  through  a  ti 
ducer  which  transforms  it  into  a  normal  form  and  reduces  the 

-  59  - 

expected  length  in  just  this  ratio,  and  then  a  Vernem  syst- 
may  be  applied  to  the  result.  Evidently  the  amount  of  key 
per  letter  of  message  is  statistically  reduced,  by  a  factor 


—  and  in  this  case  tho  key  source  and  information  source 

just  matched--an  alternative  of  key  conceals  an  alternativ 
information.    It  is  easily  seen  also,  by  the  methods  used  : 
"Information*  paper  that  this  is  the  best  that  can  be  done. 

K  Theorem  11;    'Perfect  secrecy  (omitting  the  condition  of 
independence  of  a_  priori  probabilities)  for 
.    a  source  with  fixed  statistics  and  a,  rate 
R  of  generating  Information  can  be' 'achieved 
with  a  key  source  which  generates  at  the 

rate  (R  +  e)       where  W  and  Lv  are  message 

„  •  -  _  «•  ** 


and  key  lengths^ which  correspond.  ;A  rate 
less  than  R  iM.    is  insufficient.: 

%  '  - 

Perfect  secrecy  systems  have  a  place  in  the  prac- 
picture — they  may  be  used  either  where  the  greatest  import 
is  attached  to  complete  secrecy — e.g.  correspondence  betwe. 
the  highest  levels  of  command,  or  in  cases  where  the  numbe: 
possible  messages  is  small.  Thus,  to  take  an  extreme  exam; 
if  only  two  messages  "yes"  or  "non  were  anticipated  a  perft 
•system  would  be  in  order,  with  perhaps  the  transformation  - 






-  0 





The  disadvantage  of  perfect  systems  for  large  co: 
pondence  systems  is,'  of  course,  the  equivalent  amount  of  ke 
that  must  be  sent.  In  succeeding  sections  we  consider  what 
be  achieved  with  smaller  key  size,  in  particular  with  fini- 

19.  Equivocation 

Let  us  suppose  that's  simple  substitution' cipher 
been  used  on  English  text  and  that  we  Intercept  a  certain  t 
N  letters,  of  the  enciphered  text.    For  N  fairly  large,  mo: 
than  say  50  letters,  there  is  nearly  always  a  unique  solut: 
the  cipher;  i.e.  a  single  good  English  sequence  which  tram 

-  60  -  SpjffffifflffiCI&Li 

into  the  intercepted  materiel  by  a  simple  substitution.  W: 
smaller  N,  however,  the  chance  of  more  than  one  solution  is 
greater;  with  N  *  15  there  will  generally  be  quite  a  numbe: 
possible  fragments  of  text  that  would  fit,  while  with  N  =  E 
good  frecteon  (of  the  order  of  1/8)  of  all  reasonable  Engl: 
sequences  of  that  length  are  possible,  since  there  is  seldc 
more  than  one  repeated  letter  in  the  8.  With  N  «*  1  any  let 
is  clearly  possible  and  has  the  same  a  posteriori  probabili 
as  Its  a  priori  probability,.  For  one^letter  the  system  is 
feet,  ~ 

This  happens  generally  with  solvable  ciphers.  Be 
any  material  is  intercepted  we  can  imagine  the  a^  priori  pre 
bill ties  attached  to  the  various  possible  messages,  and  a Is 
to  the  various  keys.  As  material  Ik  Intercepted,  the  crypt 
lyst  calculates  the  a  posteriori  probabilities;  and  as  N  ir 
the  probabilities  *>f*""certa  in  messages  •  increase  *  and  of  most 
decrease,  until  finally  only  one  is  left ^  which  has  a  probe 
nearly  one,  while  the  total  probability  of  all  others  is  ne 
zero,  -  :  r. 

This  calculation  can  ectually  be  carried  out  for 
simple  systems.    Table  1  shows  the  a  .posteriori  probabiliti 
for  a  Caesar  type  cipher  applied  to  English  text,  with, the 
chosen  at  random  from  the  26  possibilities.    To  enable  the 
of  standard  letter  digram    and  trigram  frequency  tables  the 
has  been  started  at  a  random  point  (by  opening  e  book  and  p 
a  pencil  down  at  random  on  the  page).    The  messege  selectee 
this  way  begins  "creases  to  •  ,  ,"  starting  inside  the  wore 
creases.    If  the  message  were  to  start  with  the  beginning  c 
sentence  a  different  set  of  probabilities  must  be  used,  cor 
ponding  to  the  frequencies  of  letters,  digram     ,  etc,,  at  t 
beginning  of  sentences,  ./.„.■ 

The  Caesar  with  random  key  is  a  pure  cipher  and  t 
particular  key  chosen  does  not  affect  the  a  posteriori  prot 
bilitles;    To  determine  these  we  need  mereTy  list  the  possi 
decipherments  by  all  keys  and  calculate  their  a  priori  prob 
bilitles*    The  a  posteriori  probabilities  are  Ehese  divided 
their  sum;    These  possible  decipherments  are  found  by  the 
standard  process  of  "running  down  the  alphabet"  from  the  me 
and  are  listed  at  the  left*    These  form  the  residue  olass  f 
the  message.    For  one  intercepted  letter  the  a  posteriori  p 
bilitles  ere  equal  to  the  a_  priori  probabilltres  for  letter, 
are  shown  in  the'  column-  headed  Nf  s  1,    For  two  intercepted 
letters  the  probabilities  are  those  for  digram     adjusted  t 
sum  to  unity  and  these  are  shown  in  the  column  N  *  E. 

-  6i  -   aaffflft 

Table  1 

A  Posteriori  Probabilities  for  a  Caesar  Type  Cryptogr 


N  =  1 

N  -  2 

N  -  3 

N  -  4 


•  032 





,  .036 





/  • 

F  U  H  D  V 

,  .023 


G  V  I  E  W 

.  .016 


H  W  J  F  X 


-  .015, 


I  X  K  G  Y 





K  Z  M  I  A 

.  .005 

L  A  N  J  B 

.  .040 

.  ,072 

.  .250 



,  .020 


.  .022 

.  *.oi 

N  C  P  L  D 

.  ,072 

4  ,066 

0  D  %  M  E 

.  .079 

V  .034 

P  E  R  N  F 

,  ,,023 

,  .085 

.  #438 

a  n 

.  -#43 

Q  F  S  0  G 

.  „002 


.  .060 



•  .066 


.  .005 

T  I  V  R  J 




U  J  W  S  K 

.  .030 

V  K  X  T  L 

.  .009 

W  L  Y  U  M 

.  .020 



X  M  Z  V  N 


Y.N  A  WO 



Z  0  B  X  P 


A  P  C  Y  Q 


.  .066 

B  Q  D  Z  R 


Q,  (digits) 



.  .602 


Trigram  frequencies  have  also  been  tabulated  and  .these  are 
in  column  N  *.3.    For  four  and  five  letter  sequences  probe 
,  ties  were  obtained  by  multiplication  from  trigram  t re quenc 
since  approximately  "  ,\  '..  Vv^w.-'-- 

•v-  • 

p{ijki)  --p(tjk)  PJk(^) 

■  **-  ■  ->        .        --.  ■ 


-  62 


Note  that  at  three  letters  the  field  has  narrowe 
to  four  messages  of  fairly  high  probability,  the  others bei 
snail  in  comparison.    At  four  there  are  two  possibilities 
five  just  one,  the  correct  decipherment. 

In  principle  this  could  be  carried  out  with  any 
but  unless  the  key  is  very  small  the  number  °f  jg""^ 
so  large  that  the  work  involved  prohibits  the  actual  caicu 

This  set  of  a  posteriori  probabilities  describes 
the  cryptanelyst's  knowledge  of  the  message  and  key  g re due 
becomesPmore  precise  as  enciphered  material  is  obtained 
description,  however;  is  much  too  involved  and  difficult  t 
obtain  for  our  purposes.    What  is  desired  is  a  simplified 
caption  of  this  approach  to  uniqueness  of  the  possible  sc 

We  will  first  define  a -quantity  Q  called  the  "ec 
vocation"  which  measures  in  an  average  way  ^.^J*8"*; 
the  solution,  or  How  far  it  is  from  unicity.  Suppose  tha; 
celtl in  cryptogram  E  ,of  N  letters  has  been  intercepted.  . 
c?yptaSa^st  III  in  principle  calculate  the  a  posteriori  , 
Mlities  by  the  use  of  Bayes'  theorem..-  Thus 

P^M)  «  P(M)  PM(E)/P(E) 

Similarly  the  probabilities  for  various  keys,  after  E  has 
intercepted  are  given  by 

P2(K)  -  PlK)  Pk(E)/?(E) 

The  equivocation  of  the  message  should  measure 
way  how -spread  out  these  probabilities  PE(M)  are;  how  far 
are  from  being  concentrated  at  one  message.  In  Xio*  with 
General  principles  of  measuring  such  dispersion,  as  in  th 
Srhnioe  uncertainty,  and  generating  Information,  we  de 
He  Equivocation  or  tU  messfge  when  E  has  been  intercept 

...  ■  ■■  ....... 

•v^-v^-.  ,  ■         ^(M)  m      j.  pg(M)  log'  Pe(M) 


the  summation  being  over  ell  P05*1*1^*3  !f  ^ven^1*1"1 
equivocation  in  key  when  E  in  intercepted  Is  given  *y 

q(K)  -  -  T  PE(K)  log  Pe(K) 

The  same  general  arguments  used  to  justify  our  me 
of  information  rate  may  be  used  here,  to  justify  the  equivc 
measure.  We  note  that  equivocation  zero  requires  that  one 
sage  (or  key)  have  probability  one,  all  others  zero.  Equi\ 
is  measured  in  the  same  units  as  information,  i.e.  alterna' 
digits,  etc.,  according  as  the  logarithmic  base  is  2,  10,  c 
In  fact,  equivocation  is  almost  identical  with  information, 
difference  being  one  of  point  of  view.  In  information  we  £ 
the  notion  of  how  much  freedom  we  have  in  choosing  one  eler 
from  a  set  with  certain  probabilities — in  equivocation  we  t 
size  the  uncertainty  of  our  knowledge  of  what  wss  chosen  wt 
probabilities  have  certain  values. 

Although  any  one  number  can  hardly  be  expected  tc 
cribe  the  set  PE(M)  perfectly  for  all  purposes,  I  think  the 
defined  here  does  as  well  as  any  single  statistic  can*  Sor. 
the  theorems  which  follow  indicate  the  mathematical  "naturt 
of  this  particular  measure. 


The  values  of  equivocation  for  the  Caesar  type  c: 
gram  considered  above  have  been  calculated  and  are  given  ir 
last  row  of  Table  1.  This  is  the  Q,  for  both  key  and  messaf 
the  two  being  equal  in  this  case. 

The  definitions  given  above  involve 'a  particular 
cepted  E,  and  ore  the  equivocations  for  that  intercepted  c: 
gram.    We  wish,  however,  to  find  a  measure  of  the  equivocf 
for  the  system  as  a  whole,  which  will  describe  this  progre: 
toward  uniqueness  as  N  increases  in  an  average  sort  of  way. 
To  do  this  we  form  a  weighted  average  of  the  equivocations 
each  particular  intercepted  message  E,  weighting  in  accord; 
with  the  probabilities  of  getting  the  E  in  question.  This 
be  called  the  mean  equivocation  of  the  system,  or  where  ttu 
is  no  chance  of  confusion  with  the  narrower  equivocation  fc 
particular  E,  we  abbreviate  to  merely  the  equivocation.  T: 
mean  equivocation  of  message  is 

Q(M)  -  -    T    P(E)  Pe(M)  log  Pe(M) 
/  M,E 


the  summation  being  over  all  M  and  all  E.  Since 

P(E)  Pg(M)  -  P(E,  M) 

the  probability  of  getting  both  E  and  M,  we  can  write  this 


Q(M)  -  -  T  P(M,E)  log  PE(M)  -  -  2  P(M,E)  log  P(M) 


-  64  -  tuiiiii  1 1 milium  m 


Q(K)  -  -  Z  P(K,E)  log  P(K)  -f—  . 

Either  of  these  mean  equivocations  is  a  theoretics 
measure  of  the  secrecy  value  of  the  system.  We  ssy  theoreti 
since  even  when  the  equivocation  is  zero,  which  corresponds 
no  uncertainty  as  to  the  message ,  it  may  require. e  tremendou. 
amount  of  labor  to  locate  the  particular  message  where  the  p 
bility  is  one.  It  might,  for  example,  be  necessary  to  try  e 
possible  K  in  succession  until  one  was  found  that  trensforme 
the  intercepted  E  into  reasonable  text  in  the  language.  Thu 
system  would  be  practically  very  good,  but  theoretically  sol 
The  equivocation  may  be  said  to  measure  the  degree  of  secrec 
when  the  cryptanalyst  has  unlimited  time  and  energy. 

The  equivocation  is,  of  course,  a  function  of  N,  t 
number  of  letters  intercepted.  The  functions  Q(K,N)  and  Q,(M 
will  be  called  the  equivocation  characteristic*  of  the  syste. 

Th3  following  data  will  be  helpful  in  forming  a  pi 
of  what  small  values  of  equivocation  represent. 

An  equivocation  of  .1  alternative  would  result  if 
9  times  in  10  there  was  no  uncertainty  as  to  M,  the  tenth  ti: 
two  M*s  were  equally  probable,  or  (2)  if  every  time  there  we 
two  possibilities  one  with  probability  .983,  the  other  with 
probability  .017,  or  (3)  if  99  times  in  100  there  W3S  no  unc 
tainty,  the  100th  tine  1000  equally  likely  possibilities. 

An  equivocation  of  ,01  would  result  <1)  if  every  t 
there  were  two  possibilities  one  with  probability  .999,  the 
with  probability  .001,  or  (2)  if  99  times  in  100  there  is  no 
certainty,  the  other  time  two  equally  likely  possibilities,  ; 
(3)  if  999  times  in  1000  there  is  no  uncertainty,  the  other  t: 
6  or  7  equally  likely  possibilities* 

*   ■  v  -.■■-* 

-  -  '*  x 

20,    Properties  of ^Equivocation 

Equivocation  may  be  shown  to  have  a  number  of  inte: 
esting  properties*  most  of  which  fit  Into  our  intuitive  pict 
of  how  such  a  quantity  should  behave*  We  may  first  show,  by 
example,  the  somewhat  surprising  fact,  that  after  a  cryptena. 
has  intercepted  certain  special- 'E*a,  his  equivocation  as  to  ! 
or  message  may  be  greater  then  before  he  intercepted  anythin, 
The  Intercepted  material  has  increased  his  ignorance  of  what 
happenedl  Suppose  there  are  only  two  messages  and  Mg  wit; 
a  priori  probabilities  p  end  qf  and  that  a  simple  substituti 


is  used  according  to  the  following  table,  the  two  keys  K±  and  K2 
also  having  the  e_  priori  probabilities- p  and  q. 







Before  the  interception,  the  equivocation  of  both  key  and  message 
is  -  (p  log  p  ♦  q  log  q),  which  is  less  than  one  alternative  if 
p  4  q.    If  p  »  q  there  is  little  uncertainty  as  to  which  message 
and  key  will  be  chosen,  Mi  and  Now  suppose  he  intercepts 

The  a  posteriori  probabilities  of  both  keys  and  both  messages  are 
easiTy  seen  to  be  l/Z.  and  hence  the  equivocation  for  both  key 
and  message  is  one  alternative,  greater  than  before.'   On  the  other 
hand,  if  Eg  is  intercepted,  the  more  probable  event,  the  equivo- 
cation for  both  key  and  message  decreases,  more  than  enough  to 
compensate  for  the  other  increase,  and  the  mean  equivocation  of 
both  key  and  message  decreases.    This  is  a  general  property  of  all 
secrecy  systems. 

The  mean  equivocation  of  key,  Qk(n)  iB  a  non-increas- 
ing function  of  N.    The  mean  equivocation  of  the 
first  A  letters  of  the  message  is  a  non-increasing 
function  of  the  number  N  which  have  been  intercepted. 
If  N  letters  have  been  intercepted,  the  equivocation 
of  the  first  N  letters  of  message  is  less  than  or 
equal  to  that  of  the  key.    These  may  be  written 

Theorem  12: 

Qm(m)  <  Qm(N) 
Qu(N)  < 

S  >  N 
M  >  N 

The  qualification  regarding  A  letters  in  the  second 
result  of  the  theorem  is  so  that  the  equivocation  will  not  be 
calculated  with  respect  to  the  amount  of  message  that  has  been 
intercepted^    If  it  iB;  the  message  equivocation  may  lend  usually 
does)  increase  for  a  timej  due  merely  to  the  fact  that  more 
letters  stand  for  a  larger  possible  range  of  messages*  The 
results  of  the  theorem  are  what  we  might  hope  from  a  good  measure 
of  equivocation,  since  we  would  hardly  expect  to  be  worse  off  on 
the  average  after  intercepting  material  than  before-.    The  fact 
that  they  can  be  proved  gives  additional  justification  to  our 

-  66  - 

The  results  of  this  theorem  can  be  proved  by  a  sub- 
stitution in  the  property  6  of  section  1»    Thus  to  prove  the 
first  or  second  we  have  for  any  chance  events  A  and  B 

Q,(B)  >  QA(B) 

If  we  identify  B  with  the  key  (knowing  the  first  S  letters  of 
cryptogram)  and  A  with  the  remaining  N  -  S  letters  we  obtain 
the  first  result.    Similarly  identifying  B  with  the  message 
gives  the  second  result.    The  last  result  follows  from 

Q(M)  <  Q(K)  *  Qg(M)     .  \ 

and  the  fact  that  QK(M)  *  0  since  K  uniquely  determines  M. 

Theorem  13:    Q,(K)  -  JM|  ~  }E|  +  jK| 

Q(M)  «  fM |  -  |E|.+  |Hf 


-  -    I    P(M,E)  log  . 

We  have 

q(k)  -  -  r 


P(K)  PK(E) 


'Q(K)  -  -  2  P(K)  PK(E)  log  P{K)  -  r  P(K)  Pk(E)  log,  PKfE) 

,  +  r  P(K)  PKiE)  log  P(E) 

Summing  the  first  term  on  E  gives  -  1  P{K)  log  P(K)  ~ 

In  the  second  term  PviE)  is  P(M)t  the  unique  M  that  gives  E 

with  key  K.  Summing  on  K  then  gives  -  T  P(M)  log  P(M)  -  |M|. 
The  third  term  is  2  P(E)  log  P(E)  -  |EU 

-  67  - 

«iJ!JlfiuJlL  1 

The.  second  equation  in  the  theorem  is  proved  by  the 
same  method. 

Q(M)  -  -  Z  P(E)  Pe(M)  log  Pe(M) 

-  -  I  ?(«)  *(»  log  F(M) 


«  -  Z  ?(M)  FM(E)  log  P(M)  -  Z  P(K)  Pm(E).  log  PM(E) 
'   +  Z  P(M)  PM(E)  log  P(E)  ': 

-  |M|  -  |S|  -  T  P(M)  PM(E)  log,  Pm(EJ  ' 

The  last  term  here  interpreted  as  follows*    Group  to- 
gether 811  the  different  keys  that  transform  a  fixed  M  into 
the  same  E,  giving  the  total  probability  to  the  group,  which  -v. 
will  be  %(E) .    The  last  term  is  the  average  size  of  this  group 
space  weighted  according  to  the  probability  P(M)  of  choosing 
among  the  groups  leading  out  of  M.    In  case  no  group  contains 
more  than  one  element  (at  any  rate  no  group  from  a  M  with 
P(M)  >  0  then  |H|  *  |K|  and  q(K)  -  Q,(M) .    This  is  also  clear 
since  there  is  then  a  one-to-one  correspondence  between  the 
keys  and  messages  for  any  given  E. 

From  the  first  equation  of  the  theorem  we  may  conclude 
that  Q(K)  -  |K|  in  case  |M|  -  fEj .    This  latter  occurs  in  par- 
ticular if  all  L''s  ere  equally  likely  and  all  E»s  equally  likely 
and  there  are  the  Same  number  of  each.    It  is  easy  to  see  that 
this  is  the  case  with  a  language  in  which  every  letter  is  equally 
likely  and  independent,  ond  when  almost  any  of  the  simple  ciphers 
are  used. 

If  we  have  a  product  system  S  s  T  R,  it  is  to  be  ex- 
pected that  the  second  enciphering  process  does  not  decrease 
the  equivocation  of  message  and  thiq  Is  actually/true  as  C8n 
be  shown  by  the  methods  used  /above*    If  T  end  R  commute  either 
may  be  considered  as  being  the  first  and  hence  in  this"  case  . 
the  equivocation  with  S  is  not  less  than  the' maximum  for  the, 
two  systems  R  and  T,    Simple  examples' show  that  this  does  not  ' 
hold  necessarily  if  R  and  T  do"  not  commute,  \\ 

Theorem  14;    The  equivocation  in  message  of  a  product 
system  S  »  T  R  is  not  less  than  that  when 
only  R  is  used.     If  T  R  -  R  T  it  is  not  less 
than  the  maximum  of  those  for  R  and  T  alone. 

68  - 

If  we  hove  a  product  of  several  systems  R  S  T  U,  we 
con  of  course  extend  this,  to  sey  that  the  equivocation  of 
R  S  T  U  is  not  less  than  that  of  S  T  U,  which  is  not  less  than 
that  for  T  U,  etc 

There  is  no  similar  theorer.:  for  the  inner  product  since 
for  example  if  T  and  R  are  inverse  processes  their  inner  product 
is  the  identity  and  the  resulting  equivocation  zero. 

Suppose  we  have  a  system  T  which  can  be  written  as  a 
weighted  sum  of  several  systems  R,  S,  U 

T  -  pxR  +  PgS  +  ♦     +  PmU       I  Pi  -  1 

1  .\-  -  ■ 

and  that  systems  R,  S,  U  have  equivocation  characteristics 

Qi,  Qe  %l*  •       .         '         ■    ;'  ' 

Theorem  15:    The  equivocation  Q  of  a  weighted  sum  of 
systems  is  bounded  by  the  inequalities 
2  PiQi  <  Q  <  2  PiQi  -  I  Pi  log  Pi 

These  are  best  limits  possible.    The  Q»s  may  refer  either  to 
key  or  to  message,  . 

The  upper  limit  is  achieved,  for  example,  in  strongly 
ideal  systems  (to  be  described  later)  where  the  decomposition 
is  into  the  simple  transformations  of  the  system.    The  lower 
limit  is  achieved  if  ell  the  systems  R,  S,  ..t)  U  go  to  com- 
pletely different  cryptogram  spaces.    This  theorem  is  also  proved 
by  the  general  inequalities  governing  equivocation, 

QA(B)  <  Q(B)  <  Q(A)  ♦  QA(B). 

We  Identify  A  with  the  particular  system  being  used  and  B  with 
the  key  or  message,  • 

There  Is  a  similar  theorem  for  weighted  sums  of 
languages,  ■  v  "■ 

Theorem  16:    Suppose  a  system  can  be  applied  to  lenguages 
•  ,  ••*      ^i#  L2».  •♦•>  Lm  Qn<l  has  equivocation  cha,rac- 

,   teristics  Q^.*  Q-2»  ^m*    When  °PPlied  t0 

the  weighted  sum  ?  Pi  Li,  the  equivocation  Q, 
is  bounded  by 

2  Pi  Qi  £  Q  £  1  Pi^i  "  1  Pi  log  pi 

-  69  - 

These  limits  are  the  best  possible  end  the  equivocations  i 
question  can  be  either  for  key  or  message. 

The  proof  here  is  essentially  the  'same  as  for  th 
preceding  case. 

An  important  consequence  of  the  result 
Q(K)  «  iKf  +  |Ml  -  JE| 

is  the  following,' 

,      .  ..«'.      *~  • 

Theorem  17;*  In  any  closed  system,  or  any  system  where 

-. <. "  the  total  number  of  possible  cryptograms  is 
.    '              ; equal, to  the  number  of  possible  messages" 

•  of  N  letters  Q(K)  >  \K]  -  <  fM0 1  -  }M|) •*  |K]  • 

'L v  *  i     "    :  where  M0  »  log  H,  with  H  the  number  of  pos- 
-  -        ,   ' ::  ■>-.■.'•'.;-.     sible  messages  of  N  letters."  Dm  is  the  total 

redundancy  for  N  letters,' 

This  is  true  since  |M0 |  >  [Ef,  the  equality  hold 
only  if  all  cryptograms  are  equally  likely.1   The  theorem  s 
that  in  a  closed  system  the  key  is  determined  only  by  the 
dundancy  of  the  language  -  the  equivocation  can  decrease  o 
es  the  redundancy  comes  into  action  and  at  no  greater  rate 

Suppose  we  have  c  pure  system  and  let  the  differ 
residue  clesses  of  nassoges  be  Ci.,  C%r  Cr,    The  co 

ponding  set  of  residue  classes  of  cryptograms  is  C^,.. 

The  probability  of  each  E  in       is  the  sane:  ; 
'    Where       is  the  number' of  different  messages  in  Thus  ; 

:    ,        -  «-z  p(Ci)  log'  -  ' 

P(E)  «  2i££i  E  e  C, 

70  - 

Substituting  in  our  equation  for  Q,  we  obtain: 
Theorem  18:    For  a  pure  cipher 

Q  -  \K\  +  (Hj  ♦  I  P(Ci)  log 

This  result  can  be  used  to  compute  Q,  in  many  cases  of  inte 

From  the  analytic  point  of  view  pure  ciphers  hcv 
simple  structure.  If  a  cryptogram  is  intercepted  its  resi 
class  gives  the  complete  information  obtained  by  the  crypt 
Within  the  residue  class  the  system  is  perfect  -  each  mess 
in  the  class  has  an  a  posteriori  probability  equal  to  its 
a  priori  probability?  For  large  N.  beyond  the  unicity  poi 
There  will  usually  only  be  one  M  in  the  class  of  reasonabl 
probability.,  and  the  -problem  is  to  determine  this  M. 

The  theorem  oh  equivocation  of  pure'  ciphers  can  : 
altered  to  show  this.    We  have 

iptCi)  log  ZllLL  «  z  p(ci)  log  p(ci)  -i  p(Ci)  log  ^- 
<?i  V1 

+  Z  ViCi)  log  k 

-  Z  PtCiJ  log  P(Ci)  +  QM(K)  -  |K| 



P(C<  ) 

Q  (K)    -  |K|  +  |M|  +  Z  P{C,  )  log   i- 

"  |*|  ♦  QM(K)  +  I  P(Ci)  log  P(Ci) 

Q  <M)  '■'  -  |M|  -  [-Z  P(Ct)  log  HCil  1 

The  equivocation  of  message  is  the  equivocation  of  message 
the  cryptogram  was  intercepted  less  the  information  imparte 
specification  of  its  residue  class,     ;        .  *      " :  ■ 

SI.    Key  Appearance  Characteristic 

Suppose  the  cryptanalyst  has  N  letters  of  message 
and  N  letters  of  the  equivalent  cryptogram.    Then  he  can  ca3 
cul.ate  the  a  posteriori  probabilities  of  the  various  keys  or 
the  basis  of  this  information,  and  if  N  is  small  there  will 
remain  a  certain  equivocation  of  key*    For  example  in  simple 
substitution,  knowing  20  letters  of  message  and  cryptogram 
does  not  disclose  the  entire  key,  since  only  about  12  letter 
of  the  26  will  be  represented, •  Thus  there  is  a  residual 
equivocation  of  log  (26-12);,  if  exactly  12  letters  appear. 
We  define  the  mean  residual  key  equivocation  as 


.   ,  /     :  .     ••  „•• ;  ,r;-:" 

when  P(E,M)  is  the  a  priori  probability  of  having  message  M 
and  cryptogram  E,  and  Pg^fK)  is  the  conditional  probability 
of  K  with  S  and  M  given* 

This  may  be  written  by  obvious  arguments  (assuming 
all  keys  equally  likely) 

%(K)-  %    P(M,K)  log  X  (M,K) 

where  X  (M,K)  is  the  number  of  different  keys  from  M  in  para 
with  K,  that  is  which  go  to  the  same  E  as  K. 

For  simple  substitution  let  P*  be  the  probability 
that  a  received  cryptogram  of  N  letters  has  X  different  lett 
appearing  in  it.  Then 

%(K)  *  £  Px  log  (26  -  x)j 


log  lbgV^26A) 

,  r 

The  bracketed  terms  vary  slowly  wifcfc atfd  it  P&)  is  fairly 
well  concentrated,  we  may  take  the  bracket'  out"  replacing  X 
by  its  mean  value  Xjv   This  gives,-  after  recombination 

-  72 

QM(K)  »  log  (26  - 

This  residual  key  equivocation  is  shown  for  simple  substi- 
tution on  English  in  Fig;  12,    It  measures  how  much  of  the 
key  has  not  been  used  in  enciphering  N  letters  of  text  on 
the  average, 

Theorem  19:       QjX)  -  Q(M)  ♦  ft^K) 

That  is,  the  total  key  equivocation  (when  we  don't  know  the 
message)  is  the  sum  of  the  message  equivocation  and  the  re- 
sidual key  equivocation;  lie;;  the  equivocation  there  would 
be  in  the  key  if  we  did  know  the  message;    This  follows  from  • 
the  fact  that  the  key  uniquely  determines  the  message 
properties  4  and  5  in  Section  X»   ■      *  . 

22.    Equivocation  for  Simple  Substitution  on  an  Independent 
.,      tetter  Language .     •  ■ 

We  will  now  calculate  the  mean  equivocation  in  key 
or  message  when  simple  substitution  is  applied  to  a  two 
letter  language,  probabilities  p  and  q  for  0  and  1,  with 
successive  letters  independent;    We  have 

%  "  %  "  -2PE  PJSlK)  log  PSlK) 

The  probability  that  E  contains  exactly  s  O's  in  a  particular 
permutation  is 

1  ,  s  nN-s  .  s  N-s, 
g-  (P    q  •     ♦  0.    P  ) 

and  the  a  posteriori  probabilities  of  the  identity  and  in- 
king substitutions  are  respectively 

ver  ting 

pa  q»"»  p1^8  q9 

hM  m  177^  ♦  ,8  p^8)  V? *  EFT*  ♦  >*; 


There  are  („)  terms  for  each  8  and  hence 


This  may  be  written 

Q(N)  =  -Z  pS  q^3  [s  log  p  +  (N-s)  log  qj 

,       /  s    N— s    s    N-s  i 
-  log  (pa  q  p^a) 

-  -N  [p  log  p  *  q  log  q]  *■  Z  (*)  pS  q1^8  log  (pS  qlN"s  q£ 
«  MR  +  iz  <N)   (pS  qN~S  *  qS  p1*"3)  log  (pS  qN-s  *  qS  p1^ 

For  p  =  1/3,  q  =  2/3,  and  for  p  *  1/8,  q  -  7/8,  Q,  has  beer 
culated  and  is  shown  in  Fig.  13, 

Now  assume  the  language  contains  r  different 
letters  chosen  independently  and  with  probabilities  p, , 
p£****»  pr*    By  approximately  the  same  argument  we  have 

1  2  T>  "l 

Q(N)  -  -Z  {sx...8T)  px      p2      ..*pr  r  log  -r± 

Sl  ! 

3P.  S*  _  Pi  "»Pr 

Sl  f 

Zp  •••PT1 

s,  ...  sr  a  r\ 

±  T  p 

where  Z  s.  »  N  and  Z  is  over  all  permutations  of  1,  8,  ... 
for  a,  tw  v 

Hence,  by  obvious • transformations 

Q(N)      m  *  £     Z      Ur5UjJ  2  Pa^.t.P^32,  log  Z  PaSl.... 

31*"  *3r 

P  '  P 

where  R  -  -£  p^^  log  p, ,  .  In  particular, 

QIO)  -  ±  ri  log  r|  -  log  r:  -  JkI 

3(1)  =  R  ♦  pj-  r  log  <r-l): 

*»  R  +  log  (r-l')l 

This  checks  the  evident  answer  for  3(1)  -  the  f: 
symbol  has  equivocation  R  and  the  parts  of  the  key  not  us* 
add  log  (r-lJI 

23.    The  Equivocation  Characteristic  for  a  "Random"  Closec 
Cipher  >  [  


In  the  preceding  section  we  have  calculated  the 
equivocation  characteristic  for  a  simple  substitution  appi 
to  an  independent  letter  language-    This  is  about  the  simj 
type  of  cipher  and  the  simplest  language  structure  possibl 
yet  already  the  formulas  are  so  involved  as  to  be  nearly 
useless.    What  are  we  to  do  with  cases  of  practical  intere 
^  .  say  the  involved  transformations  of  a  fractional  transpose 
tion  system  applied  to  English  with  its  extremely  complex 
statistical  structure?    This  complexity- itself  suggests  tfc 
method  of  approach*    Sufficiently  complicated  problems  can 
frequently  be  solved  statistically,  \  In  order  to  do  this  y 
define  the  notion  of  a  "random"  cipher..  ^ 


We  suppose  that  the  possible  messages  of  length 
can  be  divided  into  two  groups,  one  group  of  high  and  fair 
uniform  probability,  while  the  total  probability  in  the 
second  group  is  small.    This  is  usually  possible  in  inform 
tion  theory  if  the  messages  have  any  reasonable  length.  I 
the  total  number  of  messages  be 

H  »  2  0 

where  R  is  the  maximum  rate  and  N  the  number  of  letters-, 
high  probability  group  will  contain  about 


3  =  2 

where  R  is  the  statistical  rate. 

The  deciphering  operation  defin&s  a  function  M~  i 
which  can  be  thought  of  as  a  series  of  lines,  k  for  each  E 
going  back  to  various  M' s.    By  a  random  cipher  we  will  mear 
one  in  which  all  keys  are  equally  likely  and  the  k  lines 
from  any  E  go  back  to  random  M»s..    The  equivocation' in  key 
is  given  by  -  -  '  1  " 

Q(K)  -  2  P(E)  PE(K)  log  PE(K) 

The  probability  of  exactly  m  lines  going  back 
to  the  high  probability  group  is 

-  75  -  ^nil  HUB  P 

(k)    (s)m   n  s)k'm 
(m)     (IT)      11  "  I) 

If  a  cryptogram  with  m  lines  going  to  high  probability  mes- 
sages is  intercepted,  the  equivocation  is  log  m.     The  prob: 
ity  of  intercepting  such  a  cryptogram  is  easily  seen  to  be 
Sic  ' 

Hence  the  mean  equivocation  is 

■  *  ■  &  A  ui  ill*  (1-§,k"m  ■  l0s  »' 

We  wish  to  find  an  approximation^©  this  for  large  k.    If  t 

expected  value  of  m,  namely  m  *  §  k  is  »1,  the  variation  c 
log  m  over  the  range  where  the  binomial  distribution  assume 
large  values  will  be  small  and  we  oar*  replace  log  nf  by  log 
This  then  comes  out  of  the  summation  leaving  the  expected  e 
Hence  in  this  condition 

Q  -  log  |  k 

-  log  S  -  log  H  +  log  k 

-  Ik!  -  ImJ  +  1m  I 

-  IkI  -  N  D. 

If  m  is  small  compared  to  the  large  k,  the  binomial  distri- 
bution can  be  approximated  by  a  Poisson  distribution.* 

(k)    m    k-m     e"X  Xm     \  m  S  * 
lm)  ^    H  ml  a 


Q  -  £  e     S    £r  m  log  m 

•*  2 


-X    co  *  m. 
-  e        £  ~r  lo€  (»♦!)' 

*Fry,  Probability  and  Its  Engineering  Uses,  p. 214, 

-  76  - 

When  we  write  (m  ♦  1)  for  m.  This  used  in  the  regi< 
where  X  is  near  unity.  For  X  «  1  the  only  important  term 
the  series  is  m  -  1;  omitting  the  others 


<}  «  e     \  log  S 

»  X  log  2 

-  2lKl  Z'm  log  2 

Thus  <i  IK)  starts  off  at  IkI  ,  and  decreases  line 
with  slope  -D  out  to  the  neighborhood  of  N»lKl/D.  After  a 
short  transition  region,  Q,  follows  an  exponential  witn  ha 
life"  distance  l/D  if  D  is  in  alternatives  per  letter.  If 
is  in  digits  per  letter  l/D  is  the  distance  for  a  decrease 
by  a  factor  of  10.  The  benavior  is  shown  in  Fig,  14  with 
the  approximating,  curves. 

By  a  similar  argument  given  in  the  appendix,  the 
equivocation  of  message  can  be  calculated.    It  is 

Q(M)  -  lid  1  *  BQN  for  B0N«  Q(K)*1kI-DN 

CUM)  -  Q,(K)  BQN»  <4(K) 

Q,(M)  -  %{K\  -  9  (N)      B.(N)  "  Q,(K) 

where  <p(N)  is  the  function  of  Fig.  14,  with  N  scale  reduce 
by  a  factor  of    D  .    Q(M)  rises  linearly  with  slope  B0  unt 


this  line  interests  the  q(K)  line.  After  a  rounded  transl 
it  follows  Q(K)  down. 

Most  ciphers  have  an  equivocation  characteristic 
of  this  general  type,  approaching  zero  rather  sharply.  We 
wiU  call  the  number  of  letters  required  for  near  unicity 
solution  the  unicity  distance, 

24,.  Application  to  Standard  Ciphers. 

The  characteristic  derived  for  the  random  cipher 
may  be  expected  to  apply  approximately  in  many  cases,  pro- 
viaine  some  precautions  are  taken  and  certain  corrections 
are  mfde.    ThTmain  points  to  be  observed  are  the  f ollowin 

1.    We  assumed  in  deriving  the  random  characteristic 
that  the  possible  decipherments  of  a  cryptogram 
are  a  random  selection  from  the  possible  message 
This  is  not  true  in- actual  oases,  but  becomes  mc 
nearly  true  as  the  complexity  of  the  operations 
used  in  the  enciphering  process  and  the  complex! 
of  the  language  structure  increase.    The  more  cc 
'  plicated  the  type  pf  cipher,  the  more  it  should 
follow  the  random  characteristic.    In  the  case  c 

-  77  - 

a  transposition  cipher  it  is  clear  that  letter 
frequencies  are  preserved.     This  means  that  the 
possible  decipherments  are  chosen  from  a  more 
limited  group  -  not  the  entire  message  space  - 
and  the  formula  should  be  changed.    In  place  of 
R0  one  uses  Ri  the  rate  for  independent  letters 
but  with  the  regular  frequencies.    This  changes 
the  redundancy  from 

D  -  rq  -  r  *  .707  digits/letter 

Df  "  Rjl  -  R  *  •538  digits/letter 

and  the  equivocation  reduoes  more  slowly.  In 
some  other  cases  a  definite  tendency  toward  re- 
turning the  decipherments  to  high  probability 
messages  can  be  seen.    If  there  is  no  clear 
tendency  of  this  sort,  and  the  system  is  fairly 
complicated,  and  the  language  a- natural  one 
.  (with  its  very  complex  statistical  structure)  - 
then  it  Is  reasonable  to  make  the  random  cipher 

In  many  cases  the  key  does  not  all  appear  as 
soon  as  It  might.    For  example  in  simple  sub- 
stitution one  must  wait  for  a  long  time  to  find 
all  letters  of  the  alphabet  represented  in  the 
message  and  thus  deduce  the  complete  key.  The 
message  becomes  unique  long  before  this  point. 
Obviously  our  random  assumption  falls  down  in 
such  a  case,  since  all  the  different  keys  which 
differ  only  in  the  letters  not  yet  appearing 
lead  back  to  the  same  message,  and  are  not  ran- 
domly distributed.    This  error  is  easily  cor- 
rected by  the  use  of  the  key  appearanoe  character 
Istio.    One  uses  at  a  particular  N,  the  amount 
of  key  that  may  be  expected  at  that  point  in  the 
formula  for  , 

There  are  certain  "end  effects*1  due  to  the  defini 
starting  of  the  message  which  produce  a  discrepar 
from  the  random  characteristics.    If  we  take  a 
random  starting  point  in  English  text  the  first 
letter  (when  .we  do  not  observe  the  preceding 
lsttars)  hasa  possibility  of  being  any  letter  w: 


-  78  - 

the  ordinary  letter  probabilities.    The  next 
letter  is  more  completely  specified  since  we 
then  have  digram  frequencies.    This  decrease 
in  choice  value  continues  for  some  time.  The 
effect  of  this  on  the  curve  is  that  the  straigh 
line  part  is  displaced,  and  approached  by  a 
curve  depending  on  how  much  the  statistical 
structure  of  the  language  is  spread  out  over 
adjacent  letters.    As  a  first  approximation 
the  curve  can  be  corrected  by  shifting  the  line 
•   over  to  the  half  redundancy  point  -  i.e.,  the 
number  of  letters  where  the  language  redundancy 
is  half  its  final  value* 

If  account  is  taken  of  these  three  effects,  rea 
sonable  estimates  of  the  equivocation  characteristic  and 
unicity  point  can  be  made.    The  calculation  can  be  done 
graphically  as  indicated  in  Figs.  15  and  16.    One  draws  t. 
key  appearance  characteristic  TKl  -  ^A^-)        *&•  total  r 
dundanoy  curve  ImJ  -ImI  {which  fa  usually  sufficiently 
well  represented  by  the  line'  NR)  ♦    The  difference  between 
these  out  to  the  neighborhood  of  their  intersection  is 
For  the  simple  substitution  the  characteristic  is  shown 
in  Fig.  17.    In  so  far  as  experimental  checks  could  be  ca. 
ried  out  they  fit  this  curve  very  well.    For  example,  the 
unicity  point,  at  about  27  letters,  oan  be  shown  experi- 
mentally to  lie  between  the  limits  22  and  30.    With  30  le 
one  nearly  always  has  a  unique  solution  to  a  cryptogram  o: 
this  type  and  with  22  it  is  usually  easy  to  find  a  number 

With  transposition  of  period  d,  the  unicity  poi. 
occurs  at  about  1.5  d  log  d/c.    This  also  checks  fairly  w 
experimentally*       Note  that  in  this  case  Q,  is  defined  on. 
for  integral  multiples  of  d.  ' 

With  the  Vigenere  the  unicity  point  will  occur  t 
about  2d  +  2  letters,  and  this  too  is  about  right.  The 
Vigenere  characteristic  with  the  same  key  size  as  simple  i 
stitution  will  be  approximately  as  shown  in  Fig.  3.8,  The 
Vigenere,  £layf air  and  Fractibnal  cases  are  more  likely  tc 
follow  the  theoretical  formulas  for  random  ciphers  than 
simple  substitution  and  transposition,.    The  reason  for  th: 
is  that  they  are  more  complex  and  give  better  .mixing  char- 
acteristics to  the  messages  on  which  they  operate* 

■--  ■  '     i  ' 

The  mixed  alphabet  Vigenere  (each  of  d  alphabet 
mixed  independently  and  used  sequentially)  has  a  key  size. 

'4i-  .. 

1  . 















-  79  - 

IkI  -  d  log  26V-  26.3  d 

and  its  unicity  point  should  be  at  about  53  d  ♦  2  letters 

These  conclusions  can  also  be  put  to  a  rough  ex 
perimental  test  with  the  Caesar  type  cipher.  In  the  part 
cular  cryptogram  analyzed  in  Table  I,  section  19,  the  fun 
tion  QlN)  has  been  calculated  and  is  given  below,  togethe 
•with  the  values  for  a  random  cipher. 

N  .  0  ♦ 

Q  {observed)  1.41 
Q  (calculated)  1.41 

The  agreement  is  seen  to  be  quite  good,  especia 
when  we  remember  that  the  observed  9,  should  actually  be  t 
average  of  many  different  cryptograms,  and  that  D  for  the 
larger  values  of  ,M  is  only  roughly  estimated.  * 

It  appears  then  that  the  random  cipher  analysis 
can  be  used  to  estimate  equivocation  characteristics  and 
the  unicity  distance  for  the  ordinary  types  of  ciphers. 

25.    Solving  Systems  Using  Only  N-Gram  Structure.  , 

The  preceding  analysis  can  also  be  applied  to  c 
where  the  cryptanalyst  is  assumed  to  know  or  use  only  a 
limited  knowledge  of  the  structure  of  the  language.    If  n 
data  about  the  language  other  than  the  digram  frequencies 
is  used  in  solving  cryptograms  the  equivocation  curves  ma: 
be  computed,  using  for  the  redundancy  curve  that  obtained 
from  D„  alone.    This  curve  lies  below  the  curve  for  all  r< 
dundancy  and  the  unicity  point  will  therefore  be  moved  to 
a  larger  N.    Fig,  19  shows  the  Q  curves  for  simple  substi- 
tution on  normal  English  when  the  cryptanalyst  uses  only 
digram  structures.- 

26 *  .  Validity  of  a  Cryptogram  Solution. 

■  *  • 

The  equivocation  formulas  are  relevant  to  quest: 
which  sometimes  arise  in  cryptographio  work  regarding  the 
validity  of  an  alleged  solution  to  a  cryptogram..    In  the 
history  of  cryptography  one  finds  many  cryptograms,  or 
possible  cryptograms/  where  clever  analysts  have  found  a 
^solution*!*    It  involved,*  however,  sucty  a  complex  process 
the  material  was  'so  scanty,  that  the  question  arose  as  to 

-  80 

whether  the  cryptanalyst  had  "read  a  solution"  into  the 
cryptogram.    See  for  example  the  Bacon-Shakespeare  ciphers 
and  the  "Roger  Bacon"  manuscript.* 

In  general  we  may  say  that  if  a  proposed  system 
and  key  solves  a  system  for  a  length  of  material  considers 
greater  than  the  unicity  distance  the  solution  is  trust- 
worthy.   If  the  material  is  of  the  same  order  or  shorter 
;  _         than  the  unicity  distance  the  solution  is  highly  suspicioi 

Thifleffeot  of  redundancy  in  gradually  producing 
unique  solution  to  a  cipher  can  be  thought  of  in  another  \ 
which  is  helpful.    The  redundancy  is  essentially  a  series 
conditions  on  the  letters  of  the  message,  which  insure  tte 
it  be  statistically  reasonable.    These  consistency  conditi 
produce  corresponding  consistency  conditions  in  the  crypto 
gram.    The  key  gives  a  certain  amount  of  freedom  to  the 
cryptogram,  but  as  more  and  more  letters  are  intercepted, 
the  consistency  conditions  use  up  the  freedom  allowed  by  t 
key.    Eventually  there  is  only  one  message  and  key  which 
satisfy  all  the  conditions  and  we  have  a  unique  solution. 
In  the  random  cipher  the  consistency  conditions  are  in  a 
sense  "orthogonal"  to  the  "grain  of  the  key",  and  have  the 
full  effect  in  eliminating  messages  and  keys  as  rapidly  at 
possible.    This  is  the  usual  case.    However,  by  proper  de- 
sign it  is  possible  to  "line  up"  the  redundancy  of  the 
language  with  the  "grain  of  the  key"  in  such  a  way  that  tt. 
consistency  conditions  are  automatically  satisfied  and  Q, 
does  not  approach  zero.    These  "ideal"  systems  are  of  such 
a  nature  that  the  transformations  T.  all  induce  the  same 
probabilities  in  the  E  space.    Ideal  characteristics  are 
shown  in  Fig.  20. 

27.    Ideal  Secrecy  Systems. 

We  have  seen  that  *perf ect  secrecy  requires  an 
infinite  amount  of  key*    With  a  finite  key  size,  the  equiv 
cation  of  key  and  message  generally  approach  zero,  but  not 
necessarily  so*    In  fact  It  is  possible  for  Q(K)  to  remain 
constant  at  its  Initial,  value  IX).    Then,  ho  matter  how 
much  material  . is  intercepted,  there  is  not  a  unique  soluti 
but  many  of  comparable, probability.    We  will  define  an 
"ideal"  system  as  one  in  which  (UK)  and  Q(M)  do  not  approa 
zero  as-*  oo,     A  "strongly  ideal"  system  is  one  in  which 
Q(K)  .remains  constant  at  IKU 

*See  Fletcher  Pratt,  "Secret  and  Urgent" 

m  81  -  CO] 

r    ."V  5,- 


.1     1  * 


An  example  is  a  simple  substitution  on  an  artifi 
language  in  which  all  letter  probabilities  are  the  same  and 
each  letter  independently  chosen.    It  is  clear  that  Q(K)  » 
and  Q(M)  rises  linearly  along  a  line  of  slope  Rq  until  it 
strikes  the  line  Q(K),  after  which  it  remains  constant  at 
this  value. 

With  natural  languages  it  is  in  general  possible 
to  approximate  the  ideal  characteristic  -  the  unicity  point 
can  be  made  to  occur  for  as  large  N  as  is  desired.  The 
complexity  of  the  system  needed  usually  goes  up  rapidly  as 
we  attempt  to  do  this,  however*.   It  is  not  always  possible 
to  actually  attain  the  ideal  characteristic  with  any. system 
of  finite  complexity*. 

To  approximate  the  ideal  equivocation,  one  may 
first  operate  on  the  message  with  a  transducer  which  reduce: 
to  the  normal  form  «  i.e.,  with  all  redundancies  removed. 
After  this  almost  any  simple  ciphering  system  -  substitutio: 
transposition,  Vigenere  etc*,  id  satisfactory*    The  more 
elaborate  the  transducer  and  the  nearer  the  output  is  to 
normal  form,  the  more  closely  will  the  secrecy  system  ap- 
proximate the  ideal  characteristic.    Theorem  20:    A  necessa: 
and  sufficient  condition  that  T  be  strongly  ideal  is  that 
for  any  two  keys  TT    -1T    -    is  a  moasure  preserving  trans- 

1  J 

formation  of  fi^  into  itself*  ' 

This  is  true  since  the  a  posteriori  probability 
of  each  key  is  equal  to  its  a  priori  probability  if  and  onl; 
if  this  condition  is  satisfied, 

28*    Examples  of  Ideal  Socrecy  Systems. 

Suppose  our  language  consists  of  n  sequence  of 
letters  all  chosen  independently  and  with  oqual  probability 
Then  the  redundancy  is  zero,  |M:ol  ■  |M"j ,  and  from  Theorem  11 

Q(K)  -  |K|.    We  obtain  the  result 

Theorem  21?    If  all  letters  aro  equally  likely  and  independc 
any  closed  oipher  is  strongly  ideal* 

The  equivocation  of  message  will  rise  along  the 
key  appearance  characteristic  |K|  -  which  will  usuall: 

approach  |k|,  although  in  some  casos  it  does' not*.  In  the 
cases  of  N-gram  substitution,, transposition',  Vigenere  and 
variations,  fractional,  otc,  wo  havo  strongly  ideal  system; 
for  this  simple  language  with  Q(M)  —  |K|  as  oo.. 

-  82  - 

If  the  letters  are  independent  but  are  not  all 
equally  probable,  the  transposition  cipher  characteristics 
remain  essentially  the  same.    The  asymptotic  equivocations 
of  both  key  and  message  are  clearly  IKl.    In  the  substitution 
cipher  they  will  be  less.    If  all  the  letter  probabilities  are 
different,  then  the  asymptotic  equivocations  of  both  key  and 
message  are  zero.    The  letters  can  all  eventually  be  de- 
termined by  frequency  count  (apart  from  certain  exceptional 
sequences  of  zero  measure)*    Suppose  now  that  there  are  ? 

letters  with  probabilities,  '    ,  . 

...  .  , 

PX  -  P2  <  P3  <  P4  -  P5  -  P6  <  P9 

In  this  case  we  cannot  separate  p,  from  pg  or  p4  p=  and  pfi 
from  each  other,  but  the  different  unequal  probability  groups 
can  be  eventually  separated. 

If  all  substitutions  are  a  priori  equally  likely, 
there  will  be  an  asymptotic  uncertainty  among 

■  ■• 

2i  x  3I 

equally  likely  (a  posteriori)  keys.    Hence,  the  symptotic  Q, 

■  log  21  3: 

In  general  it  is  clear  that  the  asymptotic  equivocation  with 
a  substitution  where  the  different  substitutions  are  equally 
likely  is 

$m  (M)  ■        (K)  -  log  H 

vhere  H  Is  the  order  of  the  group  of  substitutions  on  the 
letter  probabilities  p^  ...  pfl  which  leave  this  set  invariant. 

More  generally  we  can  consider  an  arbitrary  pure 
sy  stem  T  and  a  pure  language  L, .  Suppose  that  T  operates  > 
only  "locally"  on  the  letters  of  U  in  the  sense  that  the  nth 
letter  of  cryptogram  depends  only  on  n  and  a  certain  finite 
number  of  the  letters  of  M  in  the  neighborhood  of  the  nth 
one:   ■  ■  -       '  itU-  -"*»-" 

ea  -  f  lK.njm^  m^,. . t.m^p)'. 


Then  we  can  show  that  there  is  a  certain  subgroup  of  the  t 
formations    T^-1T    which  are  probability  preserving  in  the 

language  L.  In  the  limiting  cases  these  would  consist  of 
the  identity  or  of  the  whole  group  ™  -1™ 

Ti  V 

Theorem  B2:    Under  these  conditions  the  asymptotic  equivoc 
of  key  is  the  logarithm  of  the  order  of  this  subgroup  of 
.  measure  preserving  transformations. 

An  ideal  secTecy  system  suffers  from  a  number  01 
-  i  '■ '.. "  '*.  .        **  \  .. 

*••  1*  The  system  must  be'  closely  matched  to  the  langue 
This  requires  an  extensive  study  of  the  structur 
of  the  language  by  the  designer.  Also  a  change 
statistical  structure  or  a  selection  from  the  se 
of  possible  messages  as  in  the  case  of  probable 
words  (words  expected  in  this  particular  cryptog 
renders  the  system  vulnerable  to  analysis. 

2.  The  structure  of  natural  languages  is  extremely 
complicated,  and  this  reflects  in  a  complexity  c 
the  transformations  required  to  reduce  them  to 
the  normal  form.    Tbus  any  machine  to  perform  th 
operation  must  necessarily  be  quite  involved,  at 
least  in  the  direction  of  information  storage, 
since  a  "dictionary"  of  magnitude  greater  than 

•  that  of  an  ordinary  dictionary  is  to  be  expected 

3.  In  general,  reduction  of  a  natural  language  to  a 
normal  "form  introduces  a  bad  propagation  of  erro. 
characteristic.    Error  in  transmission  of  a  sing 
letter  produces  a  region  of  changes  near  it  of 
size  comparable  to  the  length  of  statistical 
effects  in  the  original  language,. 

£9*    Multiple  Substitute  Ideal  Systems. 

.  *        There  is  another  way  of  obtaining  ideal  or  nearl; 
,,  ideal  characteristics  using  multi-valued  secrecy  systems. 
Suppose  our  language  contains  only  three  letters  with  - 
probabilities  1/8,  3/8  and  4/8,  and  that  successive  letter: 

84  - 


in  a  message  are  chosen  independently.  Let  there  be  1  sub- 
stitute for  the  first  letter,  3  for  the  second  and  4  for 
the  third,  and  choose  at  random  among  the  possible  substi- 
tutes for  a  letter.  It  is  clear  that  this  system  is  ideal, 
If  the  different  probabilities  are  incommeasurabl'e,  we  canr 
exactly  achieve  the  ideal  behavior,  but  can  approximate  it, 
by  using  enough  substitutes,  as  closely  as  desired* 

If  the  language  is  more  complex,  with  transition 
probabilities,  this  general  method  can  still  be  used,  but  i 
becomes  more  involved*    Suppose  the  choice  of  a  letter  de- 
pends only  on  the  two  preceding  letters,  not  on  any  more 
remote  part  of  the  message.    The  transition  probabilities 
p,  (k)  completely  desoribe  the  statistical  structure  of  the 

language.    We  supply  substitutes  for  k  When  it  follows i,  J 
proportion  to  p^  1*1*    Of  all  our  m  substitutes  mp^tk) 

represent  k  after  the  pair  irJ,    As  before  one  chooses  from 
the  possible  substitutes  for  a  letter  at  random.    The  crypt 
gram  will  then  be  a  random  sequenoe  of  the  m  substitute 

As  an  example,  suppose  the  p^j)  are  the  only 
statistics  of  the  language  and  the  values  are  given  by 

iNJ      12  3 


.1  .3  ,6 
1 2  .5  ,3 
,9     .1  0 

With  10  substitutes  0,  1,  2,  ,,,,9  we  construct  a  substitu 
table  assigning  substitutes  (chosen  randomly)  in  proportion 
to  the  frequencies*    The  following  is  a  typical  key. 




7               0,5#6  1,2,3,4,8,9 

3,9  0,4,8 

j             .\         •   »  •  * 
0,1,2,3,5,6,7,8,9  4 

If  a  3  follows  a  E  in  the  message  we  substitute  one  of  0, 
for  it,  the  choice  being  random.    A  second  table  must  be  s< 
plied  for  the  first  letter  of  the  message,  corresponding  t 
unconditional  probabilities  of  the  three  letters,  • 

Although  of  theoretical  interest  it  is  doubtful 
whether  such  systems  would  be  of  much  use  practically  beca- 
.  of  their  complexity  and  message  expansion  in  ordinary  case 
However j,  the  first  approximation  to  such  systems,  matching 
letter  frequencies,  has  b$en  used  in  ciphers  and  is  standa; 
practice  in  codes  (where  one  matches  word  frequencies). 

30 .    Equivocation  Rate." 

■ ■  .<  We  now  return  briefly  to  cases  where  the  key  is 

not  finite,  but  is  supplied  constantly,  as  in  the  Vernam  s- 
and  the  running  key  cipher In  such  cases  we  may  define 
equivocation  "rates'*.    One  ©onsldere  the  equivocation  Q(N) 
of  the  message  when  N  letters  have  been  intercepted,  The 
equivocation  rate  for  the  message  Is  defined  as  the  limit 
(assuming  it  exists): 

Lim"  Q(N) 

N-oo         ~     Q  • 

The  rate  for  equivocation  of  key  would  be  defined  similarl; 
using  the  equivocation  in  the  part  of  the  key  that  has  beei 
used  only,  but  of  course  these  two  are  the  same.    There  art 
results  for  these  parameters  analagous  to  those  obtained 
with  finite  key  cases.    Let  R»  be  the  mean  rate  of  using 


Theorem  23: 

...  *  '■• 

Q*  <  R» 

In  case  the  equality  holds  we  have  the  analogue  of  ideal 
systems  where  the  complete  information  of  the  key  goes  intc 
equivocation.    If  R*  >  IB  the  rate  of  the-message  source, 
we  can  obtain  perfect  secreoy  -  In  fact  we  may  define  per- 
fect secrecy  as  the  case  in  which  Q*  *  H«  , 

In  the  random  pase  we  have  the  analogous  result 

V      -     R»      -    D,  • 

31,    Further  Remarks  on^  Equivocation  and^  Redundancy. 

We  have  taken  the  redundancy  of  "normal  English" 
to  be  about  ,7  digits  per  letter  of  50^  of  RQ.    This  is  on 

the  assumption  that  word  divisions  were  omitted.    It  is  at 
approximate  figure  based  on  statistical  structure  of  the 
order  of  lengths  of  perhaps  8  letters,  and  assumes  the  te?. 
to  be  of  an  ordinary  type,  such  as  newspaper  writing, 
literary  work,  etc.    Various  methods  of  calculating  re<- 
dundancy  have  been  devised  and  will  be  described  in  the 
memorandum  on  information  mentioned    in  the  intro- 
duction.   We  may  note  here  two  methods  of  roughly  estimati 
this  number  which  are  of  cryptographic  interest. 

A  running  key  cipher  is  a  Vernam  type  system  whe 
in  place  of  a  random  sequence  of  letters  the  key  is  a 
meaningful  text.    Now  it  is  known  that  running  key  ciphers 
can  usually  be  solved  uniquely.  .This  shows  that  English 
can  be  reduced  by  a  factor  of  two  to  one  and  implies  a 
redundancy  of  at  least  oOjfa.    This  figure  cannot , be  reduced 
very  much,  however,  for  a  number  of  reasons,  unless  long 
range  "meaning"  structure  of  English  .is  considered*  ,  . 

The  running  key  cipher  can  be  easily  improved  to 
lead  to  ciphering  systems  which  could  not  be  solved  withou 
the  key..    If  one  uses  in  place  of  one  English  text,  about 
4  different  texts  as  key,  adding  them  all  to  the  message, 
a  sufficient  amount  of  key  has  been  introduced  to  produce 
a  high  positive  equivocation  rate.    Another  method  would 
be  to  use  say  every  10th  letter  of  the  text  as  key.  The 
intermediate  letters  are  omitted  and  cannot  be  used  at  any 
other  point  of  the  message,     This  has  the  same  effect,  sine 
the  mean  rate  for  these  spaced  letters  must  be  over  .8  Ho. 

These  methods  might  be  useful  for  spies  or  diplor 
.   who  could  use  books  or  magazines  for  the  key  source. 

A  second  way  of  showing  the  high  redundancy  of 
English  is  to  delete  all  vowels  from  a  passage.    In.  general 
it  is  possible  to  fill  them  in  again  uniquely  and  .recover 
the  original,  without  knowing  it  in  advance.  ■  As  the  vowels 
constitute  about  40j£  of  the  text  this  jmta  a  limit  on  the 
redundancy. '  Aotually  there  is  considerable  redundancy  left 
the  various  letter  and  digram  frequencies  being  far  tram 
uniform,  c  '■•  .  ■   v  v,f  -  ~--:xm-. 

■    -  -        .  \  ■    ■•. -v   •    •  "•  • 

-  -  This  suggests  a  simple,, way  of  greatly  improving 

almost  any  simple  ciphering:  system  *  -  Jirst  delete  all  vowel 
or  as  much  of  the  message  ss  possible  without  running  the 
risk  of  multiple  solutions,  -and  than  encipher  the  residue. 
Since  this  reduces  the  redundancy  by  a  factor  of  perhaps 
3  or  4  to  1,  the  unicity~  point  will  be  moved  out  by  this 


-  87  -  CONK 

factor.    This  is  one  way  of  approaching  ideal  systems  - 
using  the  decipherer's  knowledge  of  English  as  part  of  the 
deciphering  system,  ****  w  WA  6Iie 

Two  extremes  of  redundancy  in  English  prose  are 
represented  by  Basic  English  and  Joyce's  "Einnegans  Wake", 
The  basic  English  vocabulary  consists  of  only  850  words 
and  a  rough  estimate  puts  the  redundancy  at  about  70*. 
A  cipher  applied  to  this  sort  of  text  would  rapidly  approa 
unicity.    Joyce,  on  the  other  hand,  would  be  relatively  ea 

ifJSfi*??^??*  'fl?aI1  red^ancy  is  disclosed  by  the  dif- 
ficulty in  filling  incorrectly  even  a  single  missing  lett, 
pom  "Jinnegan8: Wake" f    What  the  numerical  value  is,  would 
be  difficult  to  determine >  it  varies  widely  throughout  the 


■  -     :  *  .  '"'<-./* 
The  mathematical  extremes  of  redundancy,  0  and  1C 
can  be  constructed  in  artificial  languages.   .In  the  first 
we    have  e.g..  a  single  possible  message.  0  iden- 

tically and  QIK)  ih,  the  random  cipher  case  declines  as 
rapidly  as  possible  i.e..,  as  rapidly  as  ohe  sends  informa- 
tion on  the  system,,  v In  .the  other  extreme  all  letter  sequer 
are  equally  likely,  and  any  closed  ciphering  system  is  idee 

We  may  refer  here  to  a  memorandum  by  Nyquist 
(Enciphering-Effect  of  Redundancy  in  "Language,  May  30,  1944 
in  which  some  questions  of  the  type  we  are  considering  here 
are  discussed.  i*— 

32.    Distribution  of  Equivocation. 

A  more  complete  description  of  a  secrecy  system 
applied  to  a  language  than  is  afforded  by  the  equivocation 
characteristics  can  be  found  by  giving  the  distribution 
of  equivocation.    For  N  intercepted  letters  we  consider 
the  fraction  of  cryptograms  for  which  Q  (for  these  particu- 
lar E's,  not  the  mean  OJ  lies  between  certain  limits.  This 
gives  a  density  distribution  function  • 

.   P(Q,Nh  d^ 

f01,  ^^Probability  that,  for  N  letters  Q  lies  between  the 
limits  Q  and  Q  +  dft,  .  The  mean  equivocation  we  have  previous 
studied  is  the  mean  -of  ^this  distribution.  .; 


The  function  P(Q,N),  can- be  thought  of  as  plottedalong  a 
third  dimension,  normal  .to  the  paper,  on  the  Q^N  plane.  If 
the  language  is  pure,  with  a  small  influence « range  (com- 
pared to  K)  and  the  cipher  is  pure  the  function  P(Q,N)  will 

88  -  *P0!ff'lU.iJfIAL 

usually  be  a  ridge  in  this  plane  whose  highest  point  follows 
approximately  the  mean       at  least  until  near  the  unicLty 
point.  •  In  this  case,  or  when  the  conditions  are  nearly 
verified,  the  mean  Q  curve  gives  a  reasonably  complete  pictv 
of  the  system,  • 

On  the  other  hand,  if  the  language  is  not  pure, 
but  made  up  of  a  set  of  pure  components.. 

L  •   Z       %\  , 

■  '  '  ■  '• 

having  different  equivocation  curves  with  the  system,  say 
Qi.  Qj>,  ....  Q  then  the  total  Q  distribution  will  usually  be 
made  up  of  a  series  of  Ridges*  1  There  will  be  one  for  each  1 
weighted  in  accordance  with  its  p*y   The  mean,  equivocation 
characteristic  will  be  a  line  somSwhere  in  the  midst  of  thes 
ridges  and  may  not  give  a- very  complete  picture  of  the  sit- 
uation.   This  is  shown  in  Pig*  '21  #     ,«  ,  '  ~ 

A  similar  effect  occurs  if  the, system  is  not  pure 
but  made  up  of  several  systems  with  different  ft  curves. 
There  is  then  a  series  of  ridges  in  the  PU,N)  plot,  and 
the  mean  Q,  strikes  an  average  which ,may  lie  between  ridges 
and  be  a  very  improbable  value  of  Q,  for  a  particular  crypto- 
gram.   These  effects  are  illustrated  in  Fig.  -22. 

The  effect  of  mixing  pure  languages  which  are 
near  to  one  another  in  statistical  structure  is  to  increase 
the  width  of  the  ridge.     Near  the  unicity  point  this  tends 
to  raise  the  mean  equivocation,  since  equivocation  cannot 
become  negative  and  the  spreading  is  chiefly  in  the  positive 
direction.    We  expect  therefore,  that  in  this  region  the 
calculations  based  on  the  random  cipher  should  be  somewhat 


-  89  - 


,  Practical  Secrecy 

33.    The  v.Tork  Characteristic 

After  the  unicity  point  has  been  passed  there  wil 
usually  be  a  unique  solution  to  the  cryptogram.  The  proble 
of  isolating  this  single  solution  of  high  probability  is  th- 
problem  of  cryptanalysis ..  In  the  region  before  the  unicity 
point  we  mav  say  that  the  problem  of  cryptanalysis  is  that 
isolating  all  the  possible  solutions  of  high  probability  (c 
pared  to  the  remainder)  and  determining  their  various  probe 
ities.  .  .  i        ...  /  **  -.'*  "      -  .  ... 

>.;  :;'7V--     -  . 
Although  it  is  always  possible  in.  principle,  to  de- 
f.    •  mine  these  solutions  <ty  trial  of  each  ^possible  key  for  e'xa; 

different  enciphering  systems  show  a  wide  variation  in  the  s 
of  work  required.    The  average  amount  of  work  to  determine 
key  for  a  cryptogram  of  N  letters- T"(N)  measured  say  in  man  . 
may  be  called  the  work  characteristic  of  the  system.  This 
averag.  is  taken  over  all  messages  and  all  keys  with  their  ; 
propriate  probabilities. 

;         ,  For  a  simple  substitution  on  Snglish  the  work  and 

equivocation  characteristics  would  be  somewhat  as  shown  in 
Fig..  23.-    The  dotted  portion  of  the  curve  is  where  there  ar 
numerous  possible  solutions  and  these  must  all  be  determine 
In  the  solid  portion  .after  the  unicity  point  only  one  solut. 
exists  in  general,  but  if  only  the  minimum  necessary  data  e 
given  a  gr^at  deal  of  work  must  be  done  to  isolate  it.  As 
more  material  is  used  thj  work  rapidly  decreases  toward  som 
asymptotic  value  -  where  the  additional  data  no  longer  redu-, 
the  labor.  , 

I      ,  This  is  the  work  characteristic  for  the  key.    It  : 

*  \         '.     clear  that  after  the  unicity  point  this  function  can  never  : 

•  *■  1  creese.    There  is  also  a  work  characteristic:  fdr  the  messag 

the  average  emount  of  work  to  determine  th;e;raessago  (or  all 
'  reasonable  messages)  .  .  This  will i,  ih  ordinary  cases ,  be  bel 
or  et  any  rate  not  far  above  the  work  characteristic  for  th 
key,  out  to  fairly  large  W.  since  generally  If  'the  key  is  d 
termined  it  is  easy  to  find  IS  by  the  deciphering  transformer 
For  very  largo  N,  howevdr,  this  function  will  incroa-se  due 
merely  to  the  lebor  of  deciphering  the  large  amount  of  inte: 
cepted  material.  .  - 

-  90 

Essentially  the  behavior  s^  ^>*^Mo, 
exnected  with  any  type  of  seer -c      y  quired,  however 

c.pproaches  zero.    The  seal ^ofv men  nou         *^       g>   _ven  ^ 
will  differ  greatly  with  diffor*nt  ^yp  Qr  cocipound 

th.  Q  curves  are  about  *gw.  ^  k5y  si2i3  would  have  a  muc 
Vigenere,  for  example,  with  th.  Sect/ristic.  *  good  practic: 
better  (U./nuoh  ^f^fttf"(H)curve  remains  sufficie: 
secrecy  system  is  one  one  expects  to  transmit 

ly  high  out  to  the  number  of  ™  uctSaiiy  carrying  out 

with  the  key,  -to  g^tv^t  tStuch  an  extent  that  the  inform: 
the  solution,'  or  to  delay  it  to  su  i 

tion  is  obsolete.  *  •     •  . 

-V    ^•^wiUxan,ider>n  the  following  ^^Sb/^C?L- 
.  keeping  the*  Unction  fW^o,  -  ^^^type  of  "problem  as 
»    cllv  zero,  *  This  is  essential/  -  hfttle  of  wits.*. '  In  design- 
■    is  always'the  .case  when  we ^^g^ amount  of  work 
ing  a  goodr  cipher  we  must  m  ™         unougn  merely  to 

thf  ene**rnust  do  ^  t^;k  it.^  ^  **f         twullysis  work  - 
be  sure  none  01  tho  St.  nd.ra  iU  break  the  system 

we  must  show  thct  no  method  ^tev.r  f  Q$  m  ny  systems 

<    easily.    This    U 5l!tb3i  SS  known  methods  of  solutio: 
they  were  designed  to  resist  ai  w    fl;3tnod  which  applied  to 
but  had  r  structure  leading  to     n;*>  nr™      hfcVd  b3on  many 

disclosed  werknjssos  of  th„ir  own. 

-  -v  flasiKii  is  essentially  on 

in  a  field  .  •  . 

v.-  e„r«  that  a  system  which  is  not 

vife3*         1  -„-,-  -"*""*."  »tTh  »nrv  of  Games".,    The  s: 

te^'^^^  Neumann  ^^^^^Sr  cnl  crjptanalyst  can  be  th 
,.tlori  between  the  ciPner-/t?nfi    atructure;  a  zero-sum  two  p 
•  -  '  :  ^  'lt  ss^gome"  of  »  very feLT  'Lt  ^  "novas*.   The  < 
^  game  wi%.  comp^^^  Information,^  ana  jv.  cryptan: 

I  %.  Cign#chooses  a  system  for ^^^^-^^od-of  analysis 

is  informed  of. this  choic.  and  cno       ~        rjquired  to  bre 
.    -  The  "value"  of  the  P^.J  ^  "nathod  cll0Sjn...' 

r.  cryptogram  in  the  system  cy 

•(1)  *fe  can  study  the  possible  methods  of  solution  available 
to  tha  cryptanalyst  and  attempt  to  describe  them  in  suffici^-n' 
gen:.rc.l  t^rns  to  cover  iny  methods  h^  might  use.  fc'j  th^n  con- 
struct our  system  to  resist  this  "general"  method  of  solution. 
(2)  \U  may  construct  our  ciphers  in  such  a  way  that  breaking  i 
is  equivalent  to  (or  requires  at  some  point  in  the  process)  tl 
solution  of  some  problem  known  to  be  Laborious.  Thus,  if  we 
could  show  thf.t  solving  t  system  requires  at  least  as  much  wor 
as  solving  a  system  of  simultaneous  equations  in  a  largo  numb^ 
of  unknown,  of  a  complex  type,  then  we  will  have  e  lower  bounc 
of  sorts  for  the  work  characteristic.  '  . 

"i--  r  ■  •"'  .        •„•> ' 

The  next  three  sections  ore  aimed  at  these  general 
problems.    It  is  difficult  to  define  the  pertinent  ideas  in- 
volved with  sufficient  precision  to  obtain  results  in  the  forrr. 
of  mathematical  theorems/  but  it  is  believed  that  the  conclusi 
in  the  form  of  general  principles,  are  correct. 

34 . -   Generalities  on  the  Solution  of  Cryptograms  . 

After  the  unicity  distance  has  been  exceeded  in  intc 
cepted  materiel,  any  system  can  be  solved  in  principle  by  mor_- 
trying  each  possible  key  until  the  unique  solution  is  obtained 
i.e.,  a  deciphered  message  which  "makes  sense"  in  ~l*-r.  A  simpl 
calculation  shows  that  this  method  of  solution  (which  we  may  c 
complete  trial  nnd  error)  is  totally  impractical  except  when  t 
key  is  absurdly  smalTT 

Suppose,  for  example,  we  ht-vo  a  key  of  261  possibili 
or  about  26.3  digits,  the  samu  size  as  in  simple  substitution 
English.    This  is,  by  any  significant  measure,  a  small  key.  I 
can  be  written  on  a  sm?:ll  slip  of  paper,  or  memorized  in  a  few- 
minutes.    It  could  be  registered  on  27  switches  each  having  to; 
positions  or  on  68  two  position  switches'. 

Suppose  further,  to  give  the  cryptanalystl  every  poss- 
ible* advantage,  thtt  he  constructs  a  electronic  device  to  try 
keys  &t  the  rate,  of  one  each  microsecond  ( perhaps ^eutomati call' 
selecting  from  the~rosults  by  a  X2  test  for  statistical  signi-' 
fionnce).    He  nr:y  expect  to  reach  the  right  key  about  half  way 
through,  and  after  nn  elapsed  time  of  about  ->> 

2  x  60c  x  24  X  365  x  10 

26~             •  '     ' '  ->' 

—  -  r  -  3  x  X0X*  years 

<P  w  Ami.  «    TfiK  ~    mo  '/ 


In  other  words,  even  with  a  smtll  key  compl-te  trial 
and  error  will  nev^r  be  used  in  solving  cryptograms,  except  in 
the  trivial  case  where  the  key  is  extremely  small,  e.g.,  the 

caeser  with  only  26  possibilities,  or  1.4  digits.     The  tri 
snd  error  which  is  used  so  commonly  in  cryptograph";  is  of 
different  sort,  or  is  augmented  by  other  means.     If  one  he. 
secrecy  system  which  required  complete  trial  and  error  it 
be  extremely  safe.-   Such  a  system  would  result,  it  appears 
the  original  messages,  all  say  of  .1000  letters,  weru  a  ran 
selection  of  2  RN  from  the  set  of  all  2  RoN  sequences  of  1 
letters.    If  any  of  the  simple  ciphers  w«rc  applied  to  the 
it  seems  that  little  improvement  over  complete  trial  and  «. 
would  by  possible. 

The  methods  actually- used  often  involve  a  great  trirl  and  error,  but  in  a  different  way-    First,  the  tr 
;,.;V '    _  '  progress  from  more  probable  to  less  probable  hypotheses,  a. 
*  second,,  each  trial  disposes  of  a  large  group  of  keys,. not 

%     ■         .    single  one.    Thus  the  key  space  may  be 'divided  into  say  10 
subsets,  each  containing  about  the  srjne  number  of  keys.  B. 
.  at  most  10  trials  on=  determines  which  subset  is  the  corrtsc 

one.    This  subset  is  then  divided  into  several  secondary  s 
sets  end  the  process  repeated..    Y/lth  the  same  key  size 
(K  •  261  -  2  x  102°)  we  would  expect  about  26  x  5  or  130  t: 
as  compared  to  1026  by  complete  trial  and  error.    The  poss: 
bility  of  choosing  the  most  likely  of  th~  subsets  first  fo 
test  would  improve  this  result  evefi  more.    If  the  division: 
were  into  two  compartments  (the  b^st  way)  only  90  trials  w. 
be  required.    Wiore;  s  compljt^  trie!  and  error  requires  tr: 
to  the  order  of  the  number  of  k-ys,  this  subdividing  trial 
and  error  requires  only  trials  to  th~  order  of  the  key  siz 
in  r.lternetives. 

This  remains  true  even  when  the  different  keys  h 
different  probabilities.    The  proper  procedure  then  to  min. 
the  expected  number  of  trials  is  to  divide  the  key  space  ix 
subsets  of  equiprobr bility ,    Yftien  the  proper  subset  is  det. 
t..   ,      "    .  mined,  this  is  again  subdivided  into  equi probability  subset 
;. :  If  this  process  can  bo  continued  the  number  of  trials  expec 

when  each  division  is  into  two  subsets  will  be 
*  *-  •  . 

r-v-.-" h- ki  •    -  ••  y' 

-  ■-»  •  *v.  ...  _  .         log  2  .  ,■  . 

?  yr'  *-  -r*v   .  v  jf  jfcch  test  has  S  possible  results  and  each  of  t 

fc         v;      corresponds  to  the  key  being  in  one  of  S  equiprobabilitf ~su 

rr^-. .then  .,  ,.  ....  lT^T.?^f 

t&ft-      ."■  •     1  |Vi  ■      ...    .  ' 

Vyr,.  -  •  *  •        •     •     n  -  ILL       ■  :  •       7  ,;  v..  - 

C-  \;.  '    -  .  '      log  S 


trials  will  bo  expected.  The  intuitive  aifnif icunco  of  thes^ 
results  should  be  noted.  In  %h4  two  compartment  tuSt  with 
jquiprobibility,  each  test  yields  one  altornr.tiVw  of  informa- 
tion to  the  key.  If  the  subsets  hcv^  very  different  prob- 
abilities as  in  testing  t.  single  key  in  complete  trial  and  er 
only  i  snail  amount  of  information  is  obtained  froa  th~  test. 
This  with  26:  equiproble  keys,  a  tost  of  on„  vields  only 

261-1     lnrr     26t  -1     .    1       .  m  1 

or  about  10       alternatives  of  information.    Dividing  into  S 
equiprobability  subsets  m^ximiz^s  the  information  obtained  fr 
each  trial  at  log  S,  and  the  expected  nuriber  of  trials  is  the 
total  information  to  be  obtained,  that  is  th~  key  size,  divid 
by  this  amount , 

The  question  here  is  similar  to  various  coin  weigh- 
ing problems  th; t  he Vo  been  circulated  recently.    A  typical 
example  is  the  following:     It  is  known  that  one  coin  in  27  is 
counterfeit,  and  slightly  lighter  than  the  rest.    A  chemists 
balance  is  available  r,nd  the  counterfeit  coin  is  to  be  isolat 
by  a  series  of  weighings,    '"hi  t  is  thu  lee  st  number  of  weigh- 
ings to  do  this?     The  correct  answer  is  3,  obtained  by  first 
dividing  the  coins  into  three  groups  of  9  uach..    Two  Of 
are  compered  on  the  b:  Irnce.     The  three  possible  rjsults  de- 
termine the  set  of  9  containing  the  counterfeit..    This  s^t  is 
then  divided  into  5  subsets  of  3  and  the  process  continu 
The  set  of  coins  corresponds  to  th^  set  of  keys,  the  counturf 
coin  to  the  correct  key,  and  the  weighing  procedure  to  &  trial 
or  test. 


This  method  of  solution  is  feasible  only  if  the  key 
space  can  be  divided  into  e  small  number  of  subsets,  with  s 
simple  method  of  determining  to  which  subset  the  correct  key 
belongs..   Started  in  another  way.  It  is  possible  to  solve  for 
the  key  bit  by  bit..    One  does  not  need  to  assume  a  complete  kt 
in  order  to  apply  a  consistency  test  and  determine  if  the  as- 
sumption is  justified  -  an  assumption  on  a  "part  of  the  key 
(or  as  to  whether  the  key  is  in  some  large  section  of  the  key 
space)  can  bo  tested. 

This  is  one  of  the  greatest  weaknesses  of  most  ciph 
ing  systems.     For  example,  in  simple  substitution,  an  assumpt. 
on  e  single  letter  can  be  checked  against  its  frequency,  vari 
of  contact,  doubles  or  reversals,  etc..    In  determining  a  sing- 
letter  the  key  space  is  reduced  by  1.4  digits  from  th.  origin 

26.     The  same  effect  is  seen  in  all  th~  elementary  typos  of 
ciDhers.    In  the  VigenJr^,  th-  assumption  of  tvvo  or  thre^ 
letters  of  the  key  is  easily  chock-d  by  deciphering  at  other 
points  with  this  fragment  and  seeing  whether  clear  emerges* 
The  compound  Vigene'ro  is  much  butter  from  this  point  of  view, 
if  we  assume  a  fairly  large  number  of  component  periods,  pro- 
ducing a  repetition  rate  larger  than  will  be  intercepted. 
Her-j  as  many  key  letters  ere  used  in  enciphering  each  letter 
as  there  ere  periods  -  although  this  is  only  a  fraction  of  the 
entire  keyi  at  JLeast  e  fair  number  of  letters  must  be  assumed 
before  a  consistency,  check  can  be  applied* 
.  v  ••. *•> 

Our  first  conclusion  then,  regarding  practical  small 
key  cipher  design,  is  that  a  considerable  amount  of  key  should 
be  used'  in  enciphering  each  small  element  of  the  message. 

35.    Statistical  Uethods 

'    i  -       ,.     It  is  possible  to  solve  many  kinds  of  ciphers  by 
statistical  analysis.     Consider  again  simple  substitution. 
Tha  first  thing  a  cryptographer  do^s  with  an  intercepted 
cryptogram  is  to  make  a  frequency  count.     If  the  cryptogram 
contains  say  200  letters  it  is  safe  to  assume  that  few,  if 
any,  letters  are  out  of  their  frequency  groups,  this  being 
a  division  into  4  sets  of  well  defined  frequency  limits.  The 
log  of  the  number  of  keys  within  this  limitation  may  be 
calculated  as 

log  21  91  .9!  61  «=  14.28 

and  the  simple  frequency  count  thus  reduces  the  key  uncertainty 
by  12  digits,  a  tremendous  gain. 


In  general,  e  statistical  attack  proceeds  as  follows. 
A  certain  statistic  is  measured  on  the  intercepted  cryptogram 
2.     This  statistic  is  such  that  for  all  r easonable  K  it  assumes 
about  the  sane  value,  Sr,  the  value  depending  only  on  the  par- 
ti culnr" key  25^ that  wrs  used.    The  value  thus  obtained  serves 
to  limit  the  possible  keys»  to  those  which  would  give  values 
of  S  in  the  neighborhood  of  that  observed.  .A  statistic  whicb  , 
does  not  depend  on  K  or  which  varies  as  much  with  Mas  with  K 
is  not' of  velue  in  limiting"  K»    Thus  in  transposition  ciphers , 
the  frequency,  count  of  letters  gives  no  information  about  K  - 
every  K  loaves  tB^s*  statistic  the  sane.    Hence  one  can  make 
no  use  of  a  frequency  count  in  breaking  transposition  ciphers. 

Ilore  precisely  one  can  ascribe  a  "solving  power "  to 
c  given  statistic  S»     For  valuu  of  S  there  will  be  a 
conditional  equivocation  of  the  key  Qg(K),  the  equivocation 

when  S  has  its  particular  value  and  that  is  all  that  is  kn 
concerning  the  key.     The  weighted  mean  of  these  values 

£P(S)  Qs(K) 


gives    the  mean  equivocation  of  the  key  y  hen  S  is  known,  F 
being  the:  c  priori  probability  of  the  pcrticular  value  S. 
key  size  IK  I  less  this  aean  equivocation  measures  the  "sol- 
power"  of  S, 

;    >vpr      In  a  strongly  ideal  cipher  all  statistics  of  the 
togram  are  independent  of  the  particular  key  used.    This  i: 
the.  measure  preserving  property -of  TiTiZ-Von  the  a  space  o 
Tj-lTk  on  the     space  mentioned  abovS.  -~  • 

There  are  good  and  poor,  statist ic's,  just  as  ther 
good  and  poor  nethods  of  trial  and.  error.    Indeed  the  tri:.; 
error  testing  of  hypothesis  Jj  a  type  of  statistic,  i-nd  wh. 
yiB  said  above  regarding  the  .best  types  of  trials  holds  ge: 
-  "A  good  statistic  for  solving  a  system  must  have  th~  follow" 

1.  It -must  bo  simple  to  measure. 

2.  It  nust  depend  more  on  the  key  then  on  the  nesse  t 
if  it  is  meant  to  solve  for  the  key.  The  veriati  c 
with  K  should  not  mask  its  vrriation  with  K. 

3.  The  values  of  the  statistic  that  can  be  "resolved' 
in  spite  of.  the  "fuzziness"  produced  by  variation 
in  II  should  divide  the  key  space  into  a  number  of 
subsets  of  comparable  probability,  with  the  static 
tic  specifying  the  one  in  which  the  correct  key 
lies.     The  statistic  should  give  us  sizable  infor- 

.    nation  about  the  key,,  not  a  tiny  fraction  of  an 
-       alternative.  .  •  '  -  -" 

-4*  ...The  gives  nust  be  simple  and  usable 
."      •  .  -  :    Thus  the  subsets  In  which  t bo  statistic  locates  th 
v^key  rxust  be  of  .*L  simple  nature  in  ths^key  spuce. 

:'-  *>r< _  '  :iv '..  *' n^-ifHfcv''  .  -irfA  . 

,    Frequency  count  for  simple  substitution  is  an 
:  ,«$$opi£ uof  't.  very  good  statistics*  _  '     ^  ^Vv^:-. 

.    »  ..  _  ,^t.  ...  .  ..  .  - 

Two  methods  (other  tban >rocouris^'o:^i%enl'  systems 
suggest  themselves  for  frustrating  a  statistic^ analysis. 
These  we  mcy  cf 11  the  methods  of  diffusion  and  confusion, 
the  method  of  diffusion  th^  statistical  structure  of  R  whic: 
leads  to  its  redund:  ncy  is  "dissip;  ted"  into  long  range  st: 
-  i.e.,  into  statistic;!  structure  involving  long  coabinati 

-  96  - 


-  of  letters  in  the  cryptogram.    The  effect  here  is  that  the 
must  intercept  a  tremendous  amount  of  material  to  tie  down 
sturcture,  since  the  structure  is  evident  only  in  blocks  o: 
small  individual  probability.    Furthermore  even  when  he  har 
ficient  material,  the  analytical  work  required  is  much  gre? 
since  the  redundancy  has  been  diffused  over  a  large  number 
individual  statistics.    An  example  of  diffusion  of  statisti 
is  operating  on  a  message  m  -  mi,  m2,  m3  .....  with  a  "smoc 
ing"  operation,  e^g,  >v  , 


'  vn  "s  mn+i  mod  26  ,    ■  -  - 

.      -  -V   -  •  i-1   '        •-r  ^K,-/V 

-  ,    ,  *  "         f  .  w    HurlfCf.  ■*■•■   •••  •  " "'        •  -    *        ■  1 

adding  s  successive  letters  of  the  message  to  get  a  letter 
^One  can  show  that  the  redundancy  of  the  y  sequence  is  the  s 
as  that  of  the  m  sequence,  but  the  structure  has  been  dissi 
Thus  the  letter  frequencies  in  y  will  be  more  nearly  equal 
«  in  m,  the  diagram  frequencies  also  mor3  nQapiyfaqual  etc, 

...  -     deed  any  reversible  operation  which  produces  -one  letter  out 

each  letter  in  and  does  not  have  an  infinite  "memory"  has  a. 
output  with  the  sams  redundancy  as  the  input.  The  statisti 
can  never  be  eliminated  without  comwession,  but  they  can  t 
spread  out*  • 

..r  .'  The  method  of  confusion  is  to  make  the  relation  t 

the  simple  statistics  of  3  and  the  simple  description  of  K 
complex  and  involvid  one.     In  the  case  of  simple  substituti 
was  easy  to  describe  the  limitation  of  K  imposed  by  the  let 
frequencies  of  3.     If  the  connection  is  very  involved  and  c 
fused  the  enemy  can  still  evaluute  a  statistic  Si  say  which 
the  key  to  a  region  of  the  key  space.    This  limitation,  how 
is  to  some  complex  region  R  in  the  soace  -  folded  over  many 
and  he  has  a  difficult  time  mr.king  use  of  it,    A  second  stc 
S2  limits  K  still  further  to  Rg,  hence  it  lies  in  the  inter, 
region  R1R2*  but  this  does  not  help  much  because  it  is  so  d; 
cult  to  determine  just  what 'the  intersection  is."  . 

i  ,  'v-v  To  be  more  precise  lot  us  .suppose  the  It ey  space  he 

oertcin  "natural  coordinates*  kl,k2,  "  . k-  which  he  .wishes 
terminey.    .He  measure's  c  set  of  -'stati sties  sijSg^^^s' anc 
ere  sufficients  to  determine  the  k^.    However,  in  the  method 
confusion,  th*  equations  connecting  thes a  sets  of  variables 
involved  and  complex.    We  have, :  s^y,  -: '•^•;':'r'a~-~ 

fn(k1,k2,,.;,ki>).- sn, 

-  97  - 


and  all  the  f.  Involve  all  the  k^.    The  cryptographer  must 

solve  this  system  simultaneously  -  a  difficult  job.     In  the 
simple "(not  confused)  cases  the  functions  involve  only  a 
small  number  of  the  k.  -  or  at  least  some  of  these  do*  One 
first  solves  the  simpler  equations,  evaluating  some  of  the 
ki  and  substitutes  these  in  the  more  complicated  equations. 

The  conclusion  here  is  that  for  a  good  ciphering 
system  steps  should  be  taken  either  to  diffuse  or  confuse 
the  redundancy  (or  both)-  /  /  . 

V  '>  ■  "  ■  -  "AV.  . 

36,    The  Probable  Word  Method,       .  -      '         _  ,       .  . 

One  of -the  most  powerful  tools  for- breaking  ciphers 
is  the  .  use  of  prQbable  words,.    The  probable  words  may-^.-J^.y 
words  or  phrases  expected  in  the  particular  message  flue,  tq  j"; 
its  source,  or  they  may  merely  be  common  words  or  syllables 
which  occur  in  any  text  in  the  language,  such  r.s  the;  end, 
tion,  thrt,  etc.."    v  i 

In  genera 1>  the  probable  word  method  is^used  as 
follows*    Assuming  a  probable  word  to  be  at  some  point  in 
the  cleT,  the  key  or  r  part  of  the  key  is  determined*  This 
is  used  to  decipher  other  pp. rts  of  the  cryptogram  and  provide 
r  consistency  test*    If  the  other  prr£s  come  out  in  clerr, 
the  resumption  is  justified. 

There  pre  few  of  the  classical  type  ciphers  that 
use  a  sm^ll  key  and  can  resist  long  under  a  probable  word 
analysis.    Fr^m  a  considerr  tion  of  this  method  v.e  can  frame 
a  test  of  ciphers  v.hich  might  be  called  the  r  e  id  test.  It 
applies  only  to  ciphers  with  a  small  key  (less  thr.n  say  50 
digits),  applied  to  natural  languages,  and  not  using  the 
ideal  method  of  gaining  secrecy.    The  rCid  test  is  this: 
Hoy.  difficult  is  it  to  determine  the  key  or  a  p^rt  of  the 
key  knowing  n  sample  of  message  rnd  corresponding  cryptogram? 
Any  system  in  v.hich  this  is  easy  cannot  be  very  resistant, 
for  the  cryptr.nrlyst  can  always  make  use  of  probable  words,- 
combined  with  trial  and  error,  Until  a  consistent  solution 
is  obtained- 

-  - .         '      v  •'         .'• ' ■   ■  . :     "  ri  - 

The  conditions.  r>n  the,  size  of,  the  k:y  make  the 
amount  of  trial  end  error  small,  and  .the' -condition  about" 
ideal  systems  is  necessary,  since  these  automatically  give 
consistency  checks-    The  exist enoe~ of . probable  words  and  v."*;-.-. 
phrrses  is  implied,  by  the  condition  .of  natural  language  a*  .  * 
Conversely,  it  seems  reasonable  that  if  the  key  is  difficult*    ?  ' 
to  obtain,  knowing  a  text :ahd  Its  cryptogram,  then  the 
system  should  be  strong.         •  .*"■■'  ' 

-  98  -  COlMflENTIAL 

Note  that  this  requirement  by  itself  is  not  con- 
tradictory to  the  requirements  that  enciphering  and  decipher- 
ing be  simple  processes.    Using  functional  notation  we  have 
for  enciphering 

and  for  deciphering 

E  =  f  (K,  I) 
M  -  g  (K,  E). 

Both  of  these  may  be  simple  operations  on  their  arguments 
without  the  third  equation 

.   -      K  »  h  (M,  E) •      -     -  ■  -  ' 

•  .     jg  -.     ■      '     ,    .  .- 

being  simple*  \.       ^        v''""  ;- 

^         •      -  .  .3        '  :"      ::  ''5v 

V'e  may  also  point  out  In  investigating  a  new  type 
of  ciphering  system  one  of  the  best  methods^off attack  is  to 
consider  hove  the  key  could' be  determined' if  a  sufficient 
mount  of'M  and  E  were  given.  - 

With  a  small  key,  the  work  required  to  solve  a 
system,  given  a  lerge  emount  of  dr.ta,  may  be  expected  to  be 
not  more  thrn  a  few  orders  of  magnitude  greater  thpn  the 
work  required  to  obtain  the  key  from  a  small  amount  of  datr 
when  both  U  end  E  nrc  known. 

The  same  principle  of  confusion  era  be  (nnd  must  be 
used  here  to  crer-te  difficulties  for  the  cryptanrlyst. 
Given  K-rn^mg  ...  mg  end  E  -  e,  eg         eQ  the  crypt  rn^lyst 

enn  set  up  equations  for  the  different  key  elements  k^  kg 

(nrmely  the  encipherings  equations)*  V;  " 

fg  (n^,  m2#  •♦♦,m8J  l£i#».*#kr>^ 

-  99  -  '   mm  lUiLUTiius — - 

All  is  known,  we  assume,  except  the  k,.    Erch  of  thr  s  j  equa- 
tions should  therefore  be  complex  in      the  k.,  and  involve 
ninny  of  then.     Otherwise  the  enemy  en  solve    tho  sicple  om 
and  then  the  more  complex  ones  by  substitution. 

From  the  point  of  view  of  increasing  confusion,  it 
is  desirr-ble  to  hive  the-  f^  involve  several  n^.t  especially 

if  these  sre  not  adjacent  and  hence  less  correlated.  This 
introduces  the  undesirable  feature  of  error  propagation., 
however,  for  then  erch  e,  will  generPlly  affect  several  m, 
in  deciphering,  and  an  error  will  spread  to  rll  these.. 

We  conclude  thet  much  of  the  key  should  be  used  Ir. 
an  involved  manner  in  obtaining  any  cryptogram  letter  from 
the  message  to  keep  the  work  characteristic  high*    Further  r 
dependence  on  several  uncorrected  m.  4-s  desirable,,  if  some 
propagation  of  error  can  be  , tolerated*    V/e  are  led  by  all 
three  of  the  rrguments  of  these  sections  to  consider  "mixing 
transformations,."  , 

37*    Mixing  Trensf ormo tions 

A  notion  that  hr-s  proven  v^lu^ble  in  certain  branc 
of  probability  theory  is  the  concept  of  a  "mixing  transforms 
tion."  Suppose  we  have  a  probability  or  measure  space  0,  ar. 
measure  preserving  transformation  T  of  the  space  into  itself 
i.e.,  a  transformation  such  that  the  measure  of  a  transform* 
region  TR  is  equal  to  the  measure  of  the„initial  region  R. 
The  transformation  is  called  mixing  if  for  any  function  de- 
fined over  the  space  ,  end  any  region  R. 

n^o,    J  'til)  dP  -  J  dP  J  f  (P)  dP. 
T°R  R       O  ' 

This  means  that  any  initial  region  of  the  space  R  under  suc- 
cessive applications  of  T  is  mixed  into  the  entire,  space  & 
With  uniform  density*    In  general  S^R  becomes,  a  region  con- 
sisting of  a  large  number  of  thin i  filaments  spread  through- 
out the  region..'  As  n  increases  the  filaments  become  finer 
and  their  density  more  nearly  constant* v       •  v 

An  example  of  a  mixing  transf  ormation  is  shown  in 
Fig.  21.    Here  measure  is  identified  with  Euclidean  area.  ' 
The  spaoe  is  the  'triengle and  tNp  is  the  print  \  units ■  «f 
distance  ab^ve  point  P  providing  this  does  n*>t  g^  outside 
the  triangle*    When  the  top  of  the  triangle  is  renched  a 
point  is  transferred  first  to  the  point  directly  beneath, 
and  then  over  to  the  right  en  irrational  fraction  of  the 
base  width.     If  this  carries  the  point  beyond  the  right  edge 

-  100  - 

the  extra  distance  is  mersured  from  the  left  edge.  -Successive 
transforms  of  b  square  region  ere  shown  in  Fig.  21.    For  \ 
ve,ry  lrrge  the  squar-.  is  turned  into  q  uniform  grating  ot 
nearly  parallel  thin  strips  covering  the  triangle. 

A  mixing  transformation  in  this  precise  sense  en 
occur  only  in  a  spaee  with  on  infinite  number  of  points,  for 
in  a  finite  point  space  the  transf ormation  must  be  periodic. 
Speaking  loosely,  however,  we  can  think  of  a  mixing  trans- 
formation as  one  which  distributes  ?ny  reasonably  cohesive 
region  in  the  space  fairly  uniformly  over  the  entire  space. 
If  the  first  region  could  be  described  in  simple  terms,  the 
second  would  require  very  complex  ones*    In  the  case  of 
y~  cryptographic  interest,  the  original  region  is  all  of  a  cer- 

•.;  tain  simple  statistical  structure  —  after  the  mix  the  region 

.<  '  .is  distributed  and  the  structure  diffused  and  confused* 

.    Go~d  mixing  transformations  are  often  formed  by  re- 
k.     &  "     peated  products  of  two  simple  non-commutating  operations*. 
.  ' See  for  example  the  mixing  of  pastry  dough  discussed  by  Hopf.* 
The  dgugh  is  first  rolled  out  into  a  thin  slab,,  then  folded 
over,-  then'  rolled,  and  then  folded  again,  etc 

In  a  good  mixing  transformation  of  a  space  with 
natural  coordinates  X,,  X2,.  .  *.  .,  Xg  the  point  X.  is  carried 
by  the  transformation    into  a  point  Xi,  with 

Xj^  ■*■  f  ^  (X^ ,  Xg ,  • » »  , ,  Xg )  i  "  1 ,  2 ,  *  •  •  ,S 

and  the  function*  f,  are  complicated,  involving  all  the 
variables  in  a  •"sensitive"  way.    A  small  variation  of  any  one, 
X3,  say,  changes  all  the  XI  considerably.    If  X„  passes  throug 
its  range  of  possible  variation  the  point  XI  traces  a  long 
winding  path  around  the  space. 


Various  methods  of  mixing  applicable  to  statistical 
sequences  of  the  type  found  in  natural  languages  can  be 
-devised.    One  whioh  lo  ;ks  fairly  good  is  to  follow  a  prelim- 
inary  transposition  by  a  sequence  of  alternating  substitutions 
. '.  '  J  end  simple  linear  operations,  adding  adjaoen^  letters  mod  26 

*  for.  example  *  •    r  ■.  ..;  > 

Thus  .  >.-.  '. 

S*Jht      r-'i-  •  •    .  •  •  ■  *'  .  .  . -f  i  SJ  rv-.  -  •  ' 

H  -  L3ISLT  ■  ;  . 

"where  T  is  a  transposition,  X  .is  a  linear  operation*  and  S  is 
" '  -  a  substitution. 

•  ..  . 

*E.  Hopf,  On  Causr-lity,.  Statistics  and  Probability,  Journol  ol 
.    /      Mrth*  and  Physios,  V.13,  pp. 51-102,  1934. 

<  v 

i  ■a 

-  101  - 

38.     Ciphers  of  the  Type  1\HS. 
 1   1 

Suppose  that  H  is  r  good  mixing  transformation  * 
can  be  applied  to  sequences  of  letters  and  thst  T.   find  S. 
any  two  simple  families  of  t ran s formations ,  i.e.,    two  J 
ciphers 4  which  may  be  the  same..  For  concreteness  we  m^y  1 
of  them  as  both  simple  substitutions.. 

It  appears  that  the  cipher  THS.will  be  r  very  g: 
ciphering  system  from  the  standpoint-  of  its  work  chnrnctei 
In  the  first  place  it  is  clcr  on  reviewing  our  arguments 
statistical  methods  that  no  simple  statistics  will  give  ir 
tion  about  the  key  -  any  significant . statistics  derived  fr 
must  be  of  e  highly  involved  end  very  sensitive  type  -  the 
dundpncy  has  been  both  diffused  and-  confused  by  the  mixing 
.  .  Also  probable  words  led  to  e  complex  system  of  equations 

Ing  all  parts  of  the  key  {when  the  mix  is -good),    which  mu 
.solved  simultaneously,.  The  bad  features  of  such  a  system 
v  v       ••  - :*     propagation  of  errors  and  complexity  of  operations,  both  c 
/  •    V:         which  get  worse  ns  the  mixing  of  H  gets  better. 

It  is  interesting  to  note  that  if  the  cipher  T  i 
omitted  the  rempining  system  is  similar  to  S  nn1  thus  no 
stronger.    The  enemy  merely  "unmixes"  the  cryptogram  by 
,  plication  of  H~l  and  then  solves..    If  S  is  omitted  the  re- 
maining system  is  much  stronger  th*n  T  alone  if  the  mix  is 
but  still  not  comparable  to  THS. 

The  bnslc  principle  here  of  simple  ciphers  sepa 
by  a  mixing  transformation  can  of  course  be  extended.  For 
example  one  could  use 

'S,  '  TkHiSjH2Rl 

«$& .  .       *     -  -,        •  .  ' .  >•*.»'«•• 

••    >«-       '  JIth  two  mlxes  and  three  simple  ciphers.,    One  can  also  sim 
by  using  the  same  ciphers,  and  even  the  same  keys  (inner 
product)  ns  well  as  the  same  fixing  transformations*  -  This 
•  ;*jr..        might  well  simplify  the  mechanization  of  such  systems^  " 

••/,  ■      The  mixing  transformation  which  separates  the  t\ 

>  -N  {or  more)  appearances  of  the  key  acts  as  a  kind  of .  barrier 

/>.    ti;; J**  enemy  —  it  is  easy  to  oarry  a*  known  element  over  this 
barrier  but  an  unknown  (the  key) does  not  go  easily, 

«...  ....   ,  By  supplying  two  sets  of -unknowns,  the  key  for  £ 

the  key  for  T,  and  separating  them  by  the  mixing  transform' 
H  we  have  "tangled"  the  unknowns  together  in  r  way  thrt  m«V 
solution  very  difficult, 

Although  systems  constructed  on  this  principle 

wpuld  be  extremely  safe  they  possess  one  grave  disadvantage. 
If  the  mix  is  good  then  the  propagation  of  errors  is  b^d. 

A  transmission  error  of  one  letter  v.ill  affect  several  let- 
ters on  deciphering* 


39.    The  C omi.o und  V  ige neVe 

In  the  compound  Vigenere  severcl  keys  of  length  d. 
<3gf  ..*  f  dg  are  written  under  the  message  and  added  to  it 

modulo  26  to  obtain  the  cryptogram,    The 'result  is  8  Vigenere 
with  key  of  special  type,'  -whose  repetition  is  of  period  d „  the 
least  oommon  multiple  of  cU,  <5„,         dg.    If  we  h'-'ve  three 
keys  of  periods  £,  3,  5  thl  total  period      is  50  nod  the  total 
key  size  (2+3+5)  x  1,41  -  14,1  digits.    The  situation  is  then 

M  '  al  ^  ^  m4  m5  m6  - 


H  ~\  a2  al  aE  al  kZ 
K2  -  bx  b2  b3  bx  b2  b3 

K3  -  Cl  C2  C3  C4  C5  Cl 

E    *"  el  e2  e3  e4  e5  e6 

ith  . 

el  *  ^1  4  al  +  bl  +  cl 

e2  "  ml  *  a2  4  bl  4  c2 

If  we  assume  M  nnd  E  known  then,  letting       »=       r  m( 
s  V  a.  +  b,.  0,-h,  a,  +  b3  ♦  c,  -  h5 

'  '     "  '   '  ■       +  *2  *  °2  "  h2           Ql  4  bl  4  °2  •  V  . 

Rl  *  b3  *  c3  "  h3  '  R2  *  c3  ,r  W 

.        .     .           Q2  *  bl  4  °4  "  *4           al  +  b3  4  C4  "  b9 

Ql  +  b2  +  C5  *  h5           C2  +  bl  +  C5  "  h10 

These  equations  are  easily  solved  for  the  key,  although  not  as 
easily  as  in  the  simple  Vigenero  or  othor  sinple  ciphers.  As 
the  number  of  constituent  periods  increases  the  solution  be- 
comes more  involved  and  time  consuming.    In  any  case  wo  have 
a  system  of  simultaneous  equations  each  involving  S  of  the 

total  of  B^dj^  unknowns.    The  unicity  point  will  occur  at  abou 

2B  letters  and  if  soveral  tines  this  amount  of  material  is  in- 
tercepted no  groat  difficulty,  should  be  encountered  in  breakin 
the  cipher,  providing  S  is  not  mora  than  say  6"  or  8.    With  the 
first  9  primes  as  periods  we  have  a  key  size  of  100  letters  or 
about  141  digits,  the  unicity  distance  is  about  200  letters  an 
the  key  does  not  repeat  for  223,092,870  letters.    This  systen, 
although  much  better  than  such  methods  as  simple  substitution, 
transposition  and  simple  Vigenero  with  equivalent  key  size,' 
does  not  utilize  the  available  key  fully  in  making  the  cryptV 
analyst  work  for  the  solution.    The  equations  only  involve  3 
of  the  B  key  unknowns  and  those  in  a  simple  fashion*  The 
equations  easily  oombine  and  reduce  to  eliminate  unknowns.  If 
a  large  amount  of  material  is  available,  compared  to  the  unicii 
distance,  particular  sets  of  equations  can  be  combined  to 
eliminate  unknowns  very  easily.    The  system  possesses  the  inpo: 
advantage,  however,  of  not  expanding  errors.    One  incorrect 
letter  of  cryptogram  produces  one  incorrect  letter  of  decipher*, 


By  relatively  simple  changes  this  system  could  be 
strengthened  considerably.    If  tho  equations  for  the  key 
elements  (with  M  and  E  known)  could  be  made  into  higher  degree 
equations  rather  than  linear  ones  the  difficulty  of  solution 
would  increase  tremendously.    This  could  easily  be  done  in 
a  mechanical  device  by  successive  multiplications  (Mod  26) 
of  tho  key  letters  according  to  some  prearranged  schome, 


40  »    Incompatablllty  of  the  Criteria  for  Good  Systems 

Tho  five  criteria  for  good  socrccy  systems  given  in 
seot ion  12  appear  to  havo  a  certain  inconpatability  when  ap-  - 
plied  to  a  natural  language  with  its  complicated  statistical 
structure.    With  artificial  languages  having  a  simple  statis- 
tical structure  it  is 'possible  to  satisfy  all  requirements 
♦simultaneously,  by  means  of  the  ideal  type  ciphers.    In  natural 
languages  It  seems  that  a  compromise  must  bo  made  and  tho 
valuations  balanced  against  one  another  with  a  view  toward 
the  particular  application. 

If  any  one  of  the  five  criteria  is  '"roppec* ,  the 
other  four  crn  be  s?itisfied  fr.irly  well,  r.s  the  following 
examples  show. 

1.     If  we  omit  the  first  requirement  (amount  of  secrec 
any  simple  cipher  such  os.  simple  substitution  will 
In  the  extreme  case  of  omitting  this  condition  com- 
pletely, no  cipher  at  fll  is  required  end  one  send. 
.    the  clef.ri 

2.  If  the  size  of  the  key  is  not  limited  the  Vernam 
system  can  be  used. 

3.  If  complexity  of  operation  is  not  limited.,  various 
'•extremely  complicated  types  of  enciphering  process 

cen  be  used*  The  modified  compound  Vigenere  descr 
above  with. many  different  periods  compounded  is  f e : 
satisfactory  as  an  example  here,  although  it  falls 
down  somewhat  on  the  key  size  condition.  Ideal  syf 
"and  enciphered  codes  are  also  frir  examples  althout 
not  too  good  from  the  propagation  of  error  point  o: 

4i    If  we  omit  the  propagation  of  error  condition  syst 
-  of  the  type  THS  would  be  very  good,  although  sonew: 
complice tad. 

5.  If,  we  allow  lr.rge  expansion  of  message,  vr.rious  sy.- 
are  easily  devised  where  the  "correct"  message  is  : 
with  many  "incorrect"  ones  (misinf ormrtlon) .  The  \ 
determines  which  of  these  is  correct. 

•  A  rough  argument  for  the    incompatibility  of  the.  : 

conditions  may  be  given  as  follows. 

>  '  ' 

■  '  '* :        From  condition  5,  secrecy  systems  essentially  a  s 
Studied  In  this  paper  must  be  used;  i.e.,  no  great  use  of  r. 
etci    Perfect  and  ideal  systems  are  excluded  by  condition  c 
rg^0&aMJHr  3  and  4,  respectively.    The  high  secrecy  required-  bj 
>'^;"^^^flWi«'*th«n*TD<3tf» -£rm  a  high  work  characteristic,  not  from  a 
^  high  equivocation. characteristic  ,  If  the  key  is  small,  the 
>  '_'  ^..^f^-r^: system'  simple,  and  the  errors  do  not  propagate^  probable  wc 
methods  w  11}.  generally  solve  the  system  fairly  easily,  sine 
we  then  have  a' fairly  simple  .-system  of  equations  for  the  ke 

This"  reasoning  is  too  vague  to  be  conclusive,  but 
general  idea  seems  quite  reasonable.  Perhaps  if  the  varioi. 
criteria  could  be  given  quantitative  significance,  some  sot 
an  exchange  equation  could  be  found  involving  them  and  giv: 
the  best  physically  compatible  sets  of  values.  The  two  mo: 
-  t  difficult  to  measure  numerically  are  the  complexity  of  opei 
tions,  end  the  complexity  of  statistical  structure  of  the 
•  language  .  , 


Appendix  1 

Deduction  of  -  I  pj  log  pi 

It  will  be  shown  that  the  meusure  of  choice  - 
£  Pi. log  Pi  is  a  logical  consequence  of  three  quite  reasone 
assumptions  about  the  desired  properties  of  such  a  measure. 
The  three  assumptions  are: 

V    (1)    There  exists  a  function  C(plt  p2,  pn) 
uous  in  the  p^,  measuring  the  amount  of  "choice"  when  there 
n  possibilities  with  probabilities  p^ , 

/•-.  '  • ..  '  .  '  • 

.  <2)  ,  C  has  the  property  that  If  a  given  choice  be 
broken  aown  into  two  successive  choices  the.  total  amount  of 
choice,  is  the  weighted  sum  of  the  individual  choices*  .  For 
example,  suppose  the  choice  is  from  4  possibilities  A,  B,  C 
with  probabilities  Yl,  .2,         «4U  .  .This  can  be  broken  down 
a  preliminary  choice  hetween.the  pair  A,  B  and  the  pair  C, 
Pair  A,  B  has  a  total  probability  .1  +  .2  «  .3  and  pair  c, 
probability  .3  +  .4  «  .7.    If  pair  A,  B  is  chosen  a  second 
between  A  and  B  must  be  made  with  probabilities     -*1        «  1 

.1  +  .2  Z 

42  2 

V  "         If  Pair  c»  D  is  chosen  a  second  choice  betwee 

•*  * 

and  D  must  be  made  with  probabilities  ^    and  *      ,  Thus  brok 
down  we  have  a  preliminary  amount  of  choice  C  (.3,  ,7)  end 
of  the  time  a  secondary  choice  of  c  (±  f  2  j  while  .7  of  th 

time  the  secondary  choice  is  C  (2  .  Our  condition  req 

that  the  total  choice  C  (.1,   .2,  -3,  t4)  be  the  same  as  the 
,  weighted  sum  of  the  different  choices  when  decomposed,  weig 
in  accordance  with  the  frequency  of  occurrence.    Thus  we  re 
in  this  case  C  ,2,  .3,  .4)  «  C  (.3,  .7)  +  ,3.C  (-  ,  -  ) 

;f^^!-,         If  .A(n)  ?  c  (I  #.  i,.!*.*.  .»  the  choice 

when  there  are  n  equally  likely  possibilities,  then  A  (n)  i; 
monotdnio  Increasing  in  n.     i  . 

Theoreaj   .  Under  these  three  assumptions 

(•■••»       -    -      •  _ 

C  (PI,  P2,  ,  Pn).88  -  K£  Pi  log  pi  . 

where  K  is  a  positive  constant. 

-  106  - 

From  condition  (2)  we  can  decompose  a  choice  from  equall; 
likely  possibilities  into  a  series  of  m  choices  each  from  s 
equally  likely  possibilities  and  obtain 

A  (S111)  ■  m  A(s) 


;.  (tn)  -  n  A(t) 

We  can  choose  n  arbitrarily  large  and  find  an  m  to  satisfy 

S*<  t*<  S01  ■*  1 
Thus,  taking  logarithms  and  dividing  by  n  log  S, 

5    £  <  log  t V  _m    +  ± 

'"log  s- .  ,  «         j  st       lSTs.|-<  e 
where*  is  arbitrarily  small* 
Now  from  the  monotonic  property  of  A(n) 
A(SP)  <    A(tn)  <    AO*  +  1) 

m  a(s)  <    nA(t)  <  (m  +  1)  A(S) 
Hence,  dividing  by  nA(S), 

m  s  t )  m  1 
n  —  MS)   —  n  b 

•  -  m  \k" 

-  I  <  2  e      A{t)  •  -K  log  t 

"{BY     log  S     I  *~ 

where  K  must  be  positive  to  setisfy  (3), 

Now  suppose  we  have  a  choice  from  n  possibilities  with  comme 
surable  probabilities  p^  *  where  the       are  integers* 

can  break  down  a  choice  f rom  £n4  possibilities  into  a  choice 
f roa possibilities  Tvith  probabilities  pi*  »>pn  and  then,,  if 
the  ith  was  chosen,,  a  choice  from  ni  with  equal  probabilitie 
Using  condition  2  again,  wef  equate  the  total  choice  from  £ni 
as  computed  by  two  methods 

K  log  Eni  -  c  (pi-,         ,  Pn)  +  K£  Pi  log  nj_ 

-  107  - 


C  -  K  [E  pi  log  I  ni  "  E  pi  log  ni] 
■  *  K  2  pi  log  -SL  «  -K  £  Pi  log  pi 

If  the  pi  are  incommeasureble,  they-may  be  approximated  by 
rationale  and  the  same  expression  must  hold  by  our  continuity, 

mce  and  amounts  to  the 

choice  of  a  unit  of  meesure, 




-  108  -  srfsrr 

Appendix  2 

proof  of  Theorem  4 

Select  any  message  Mi  and  group  together  all  crypto- 
grams that  can  be  obtained  from  Mi  by  an  enciphering  operation 
Ti#    Let  this  class  of  cryptograms  be  c{.    Group  with  Mi  all 
Mg  that  can  be  obtained  from  Mi  by  Tj^TjMlf  and  call  this  class 
Ox*    The  same  ci  would"  be  obtained  if  we  started  with  any  other 
M  in  Ci  since     :  ";.\.  •' 

•  -  - :  ;    ■  I  i  .      if, &  TsTj^ki %  :  %iUmm..  ' .  ■ 
.2.,: ,       ;  •  .  •;. ^^aj^;1^-" 

Similarly  the  same  Ci  would  be  obtained; :>r  > 

-  * 

Choosing  &n  M*.flf  any  exist)  not  , in  Ci.we  construct  i- 
G2  and  Ce  in  the  same  way*  .'Thus  ^We  obtain  the  residue*  classy 
with  properties  (1)  and  (2).    Let  Mi  and  M2  be  in  Ci  and  suppose 

M2  -  T2  Ti-1  Mi 


If  El  is  in  Ci  and  oen  be  obtained  from  Mi  by 

Ei  -  \  Ux  -Tp  Mx  -  Mlr 


El  *  ^  T2  Tl  M2  "  Tp  T2X  Tl  M2  "  ♦ m  ' 


"  ^  M2  -  ^  «2 

Thus  each  Mi  in  Ci  transforms  into  Ei  by  the  same  number  of  keys. 
Similarly  each  Ei  in  c{  is  obtained  from  any  M  in  Ci  by  the  same 
number  of  keys.    It  follows  that  this .number  of  keys  is  a  divisor 
k  '  ,  .  of  the  total  number  of  key*  and  hence  we  have  properties'  (3)  and  .  .. 

..     *  ^-  o<  * 

.  .  -  ••••    •  I... 

...  ,*  S6*r*  .      4.:?  * 

"  ;  1*  •. 

.    i  '      .—  .4  „• 

109  - 


x  3 

Equivocation  of  Message  for  Random  Cipher 

As  before  let  Mi  ...  Ms  be  high  probability  mes 
and  Ms+l  ••«»  Mu  have  zero  probability.    Let  P(mi,  m)  be 
probability  of  just  mi  lines  going  from  a  particular  E,  s 
to  a  particular  high  probability  M,  say  Mi,  with  a  total 
lines  to  all  high  probability  M.  Then 


.-..!-■  ft 

_,„  (k)  (m)    (i)»l  (s;i)"i-i»1(1.s) 

The  probability  of  intercepting  an  E  with  m  lines  t 
bility  M's  la:^  > 


'  ■  - 

The  Q(M)  expected  can  be  thought  of  as  contributed  to  by 
various  Mi  .in  the  high  probability  group.    Thus  Ml  contri 

.     mi       mi  ,  m 
-       log  — =  ■  —i  log  — 
m   xue  m        m      6  mi 

if  there  are  mi  lines  to  Mi  and  a  total  of  m  to  high  pro^ 
M's.    The  expected  Q  is  then 

(MM)  -  a  S  miEm  PCj.m)  §j   SL  log  S_ 

The  factor  H  sums  over  the  various  Ei  and  the  S  sums  ovei 
different  Ml,(i,     l>t         s)  •  Hence, 

Q(M)  -  I  £  P(mi,m)  mi  [  log  m  *  log  mj 

the  term  y 

i    -  v.-  ■  ,.  ■ 


E  P  (mi,m)  mx 

summed  on  mi*  gives  the  expected  mi,  when  m  lines^go  to  h 

probability.  Mgt  1*©,,  m/a,    Henoath'e  first  term  is 

•  •*  * •»:.-> fx*. ■*'■'; 

JL   £  m  P  (m)  log  m  *  Q(K) 

by  our  previous  work.    The  second  term  is 

•  JSP  (mj.,  m)  mi  log  mi 

If  the  expected  mi  is  «1  this  term  is  small  since  it  vanishes 
for  mi  ■  0  or  1.    The  expected  mi  is  k/H»    Thus  beyond  this 
point  Q,(M)  approaches  closely  to  Q,(K)  •    The  point  in  question 
is  where     JK|  •  |Mpf  -  RqN  • 



If  the  expected  »1  the  log  mi  can  be  taken  out  as  log  Hi  «* 
log  k/Hi  and  we  have'  ,  -  : 

log  =y    £  P>j 

'       '    ^  -log  §  -   }Mo1  r  .|K!:^-r  • 

In'  this  "region  then    •  -   V "  '.  '  ;  "y 

Q(1C)  •  |M0|    -  id    +  d(K) 

but  here    Q(K)  -  ]k|    -    |M0|    +  : •  Jill,  and  therefore 
q(M)  -  |m[  -  RN        .  -  ' 

In  the  transition  region  Ei  is  about  1  and  Iff  will  in 
ordinary  cases  be  very  large.    It  is  admissable  then  to  replace 
?(mi;  m)  by  P(mi)  ,  since  this  will  not  depend  on  m  to  any  extent 
except  for  values  of  m  of  very  small  probability.    Thus  we  obtain 
for  this  region 

iiU)  -  -  3  £  p(mi)  mi  log 

The  "sum  has  the  same  "form  as  our  expression  for  Q{K)  but  with 
l/H  In  place  of  s/H»    The  calculations  for  Q(K)  can  be  used, 
therefore %  with  only  a  change  of  '<  the^U  scale  byja  factor  of 

.  '•'  '  '"•  ^>-"~"  ^"'ft  *"  •'  '  i. '  J}'*' 

-  Ill  - 

.  .,"■■» 

v-  ■ 

Appendix  4 

Key  Appearance  in  Simple  substitution  with  Independent  Le- 

If  successive  letters  are  chosen  independently  e 
the  different ' letters  have  probabilities  Pi  P2         Ps»  we 
calculate  the  expected  number  of  different  letters  when  N 
letters  have  been  intercepted.  ;  It  is,. 

:,^,L,       ,i  IW  -  s  -  e  (l  -  Pi)N  ; 


To  prove  thi*« * iiaklte«iri^'*^Klbl«  sequences  of  N  le 
written  down,  each  wifch'^a  frequency  corresponding  to  its  ] 
bility,  giving  a  total  ^of  aay  A  sequences*..  Letter  1  does 
appear  in  (1  *  Pi)N  A  of  thesej  letter  E  does  not  appear  i 
(1  -  P2)N  A  etc.    Therefore/  "the  total  number  of  letters  r 
from  sequences  is 

AMI"  Pi)N 

Dividing  by  A  gives  us  by  definition  the  expected  number  t 
missing  letters  from  a  random  sequence,  E(l  -  p«)N,  rphe  j 
of  different  letters  expected  in  a  sequence  is  the  total  : 
of  letters  S  minus  this,  giving  the  desired  result. 

If  all  the  pj.  are  equal  this  reduces  to  S  -  S(l 
ah  exponential  approach  to  S«    In  the  general  case  there  i 
series  of  exponentials  with  different  time  constants,  cor: 
sponding  to  different  p^,  which  are  added  to  give  «L(N). 

With  the  frequencies  of  normal  English  used  for 
p^t  we' obtain  the  curve  shown  in  Fig*  25,  along  with  ah  e: 
mental  ourve.    The  small  discrepancy  can  be  attributed  to 
influences  of  nearby  letters*  (IaJBnglish- there  is  less  tc 
-to  double  letters  than  there  would  be  if  the  letters  were 
pendent  but" with'  the  same  probabilities.    For  English  the 
.bility  of  a  doubled  diagram  is , ^ 

i*K.'«Mu  •  .  ••'    •-  •       ■  -k.  J:  ..  *         h'S    ,  " 

r^y      'i'^i*^^-  *->..      \v.  £  P(i*  i)    "  • 0315 

.  *   while  if  letters  were  independent  it  would  be  v 

.-.  ^  -     »  -,:■■■:*■;{  p    ■  ;     ■  -  *  *.  •  •>  • ' -  -•  U. 

E  pj  *  ,0670. 

.appendix  5 

A  Theoretical  Case  Where  All  Invariant  Statistics  of  E  Are 
Independent  of  K. 

By  an  invariant  statistic  of  e  sequence  of  letters 
S  »',»..,  m_2        niQ  m^  m2  •     m3  , we  will  mean  r  statistic 
which  is  averaged  along  the  length  of  the  sequence  E»  More 
precisely  a  statistic  of  the  form:, 

Lim  i —  (F(E_b)*-»-  ♦+  F(E„i)+r{E)  ♦  F(Et)  +  F  (E2J+...+  F(En) 

n  -co  (2n+l)  (  ^  — 

....  ,  .  ■   ' .     4   *   ".'       ■  ■        ...  .      •  ■  -Vi?,  : 

'  '■■  .' . ,  *  ,  ...        "  '  •    ,.        .  "    .  -        _  •• 

where  F  is  any  function  whose  argument  Is  a  possible  sequence  ,  and 
E±a  is  the  sequence  E  shifted  N  letters  to  the  right  -or  loft. 
Such  statistics  as  the  relative  frequency  of  a  given  letter,  of, 
a  given  n-gram,  transition  frequencies,  and  frequencies  with 
whioh  letter  i  is  followed  by  letter  i  at  e  distance  n  are  all 

•  ••  • 

We  will  describe  a  system  in  which  every  invariant 
statistic  which  the  cryptanelyst  can  construct  from  the  (infinte) 
intercepted  E  is  independent  of  both  K  and  M,  and  thus  gives  no 
information  to  him.    This  effect  and  still  more  occurs  with  the 
ideal  ciphers  of  course,  but  here  it  is  obtained  independently  of 
the  original  message  statistics  and  without  any  matching  of  the 
cipher  to  the  language. 

Let  N  be  a  "random"  sequence  of  letters; 

N  *  »•»  n_2  n-i  n0  n^  n2        us  ... 

this  is  supposedly  a  known  sequenoe  (to  the  enemy)  and  thus  a 
part  of  the  system,  not  of  the  key.    Apply  eny  simple  cipher  to 
the  message  and  then  add  N  letter  by  letter  to  the  result  {mod 
B6)«    The  ♦•sum'*  is  the  enciphered  message*    'it  is  evident  that 
any  Invariant  statistic  oa  S  will  be  (with  probability  1) -the  that  for  a  rendom  sequence*    Hence  it  is  Independent 
of  both  K  and  M»        ;  x  • 

We  need  hardly  add  that  such  a  system  is  easily 
broken  ~the  enemy  merely  subtracts  N  from  E  and  then  solves 
the  simple  residual  cipher*  which 'may  often  be  done  with 
invariant  statistics,  > 

Appendix  6 

Maximum  Repetition  Rate  in  Compound  Systems  for  a  Given  To- 

We  consider  briefly  the  question  of  how  to  arran- 
component  periods  in  a  compound  Vigene're  or  Transposition  i 
to  obtain  the  longest  period  for  a  given  total  key  size, 
component  periods  are  Px,  P2,/t*»  Sg  JLt  is  clear  that  they 
b'e  co prime.  Otherwise  the  total  key,  which  is  LPif  could  \ 
duoed  without  changing  the  period,  which  is  the  least  comm; 
multiple  of  the  Pi,  merely  by  deleting  a  factor  which  appet 
several  o'f.  the  P^  from  all  but  one/  Also  each  p  must  be  e 

of  a  prime,  for  if  it  contains  two  primes,  it  can  be  divide 
these  parts,  reducing  the  key  and  not  affecting  the  period, 
the  component  periods  are  selections  from  the  series  of  pri 
and  powers  of  prime sj      . . 

4&  2„  3,  4,  5,  7,  8,  9\  )^:XZ4?m:i7'f,  19,  23,.  25,.  27, 

the  seleotion  being  pairwise  ooprimeV 

It  appears  from  empirical  evidence  that  the  best 
of  component  periods, for  a  given  total  size  S  is  found  by  t 
following  process, 

1.  Determine  the  largest  M  such  that  Ipj<S  where  the 
are  the  primes  in  increasing  order^    This  is  the 
maximum  number  of  periods  where  the  periods  are  c 
prime,  end  is  the  number  of  periods  to  be  used. 

2.  Choose  from  the  sequence  A,  M  elements,  consecuti 
except  for  the  fact  that  no  prime  is  represented 
than  once,  the  M  elements  being  as  great  as  possi 
with  aum  <S# 

3.  If  the  aum  is  <s  move  as  many  as  possible  of  the 
elements  in  this  block  up -a  notch  in  the  sequence 

v  still  satisfying  .the  conditions  .on  the  sum  and  co 

'  ■  mality ,  ■  :  i    r  •' 

4.  Repeat  3  to  either  part  of the  original  block  if 

,  ,  *  :."       sible •*•  "This  process  eventually  ends  and  apparent 
gives',  the  proper  decomposition* 

 ■  ;  *-':~>!'": 

r-?.  For  example  with  8  »  50^  the  .sum  of  the  first 
primes  is  41,  of  the  first  7  is  58.  Hence  6  peri 
will  be  used.    We  .have 

•  •  11  +  9,  +  8+  7+  £  +  3w43 

13  +  11  +9  +  8  +  ^7  +  5  *  53 

hence  we  start  with  the  block  11,  9,  8.  7    5  3 
to6givl  *  elemants  11»  9»  8'  7.can  be  up  a 

13+  11  +9+8+5+3-49 

Nj  further  improvement  seems  possible,    we  obtain 

F-  13X  11  x  9  x  8x  8  x  3  *  154, 440 
The  products  and  sums  of  the  first  n  prime's  are  given  below 

n  1    £  3  4  5  ...      6  7  8 

pn        ,  2    3  5  7  11  13  17  19 

Sum  2  ■  5  10  17  28  "  ,  41  *  58  77 

Product  2    6  30  210  2310  30030  510510  9699590'  22309! 



Figures .1-25. 















FIG.  6 

*  >- 



FIG.  8 











C3     [  M7 

]  c; 


FIG.  10 


FIG.  16 


FIG.  19 


STRONGLY  IDEAL     Q-  \*\ 


FIG.  20 

FIG.  2  2 

FIG.  23 

September  19  ,  l*4&-ll£S-CX3-yO 


la  elasaioel  ae&aanios  one  considers  situations 
where  the  state  of  a  syatoa  is  described  bj  i  Mt  of  numbers, 
tie  coordinated  of  the  phaae  space  of  the  system,  and  the 
dynamical  behavior  la  controlled  by  a  eat  of  ordinary  differ- 
antlal  equations.    Suca  a  ays tea  is  entirely  determinate;  the 
future  ia  completely  apeolfiad  by  toe  preaent  state  aad  the 
dynamical  equations,  alnoe  these  differential  equations  have, 
ia  general,  a  unique  eolation  peas  lag  through  a  gives  point. 

In  other  branches  of  physics  (host  flow,  brown! an 
motion,  diffusion  etc)  there  are  situations  which  saa  ha  called 
completely  statistical*    The  path  of  a  particle  of  gas  la 
described  only  statistically  aad  no/  determinate  or  mesa  behsrior 
ocoars.    In  this  case  oae  studies  the  flow  of  probability  which 
ia  described  by  a  partial  differential  equation  of  the  heat 
flow  typo. 

the  present  stomoraadnm  J I  sens  sea  a  partial  diff area- 
tlal  equation  ia  which  both  effects  occur— there  is  a  definite 
•mean"  motion  of  a  system  determinate  ia  character,  carrying 
its  rcpresentatlTC  point  through  phase  space  la  the  classical 
manner  with  a  superimposed  statistical  effect  continually  per- 
turbing it  from  this  path. 

•  a  - 

2a  suoa  a  mm  toe  futars  coordinates  of  tbs  aysteas 
•uuot  bo  precisely  predicted;  oaly  «  probability  distributioa 
fuaoUoa  oaa  be  deterained  for  tha  future  tiae  aaose  *alae 
times  tli«  volww  eleaeat  dT  is  tae  probability  tbet  tae  ayatea 
will  m  la  ibt  wolaa*  eleaent   dr   around  tae  poiat  la  question. 
For  a  snort  tlaa  tne  ays  tea  is  substantially  deteralnata ,  tbs 
dlatribatloa  being  concentrated  around  a  point  whleb  morm*  ao- 
aordlau  to  tae  determinate  part  of  tae  equation.    As  tba  statis- 
tical off acta  ooaa  into  play  this  distribution  broadens  oat  aad 
la  general  approaabea  a  Halting  distributioa  anion  ia  indepen- 
dent of  tbe  initial  atato  of  tbs  systeau 

Xa  eoac  rasps ota  taa  situation  ia  stalls*  to  tbet  la 
quantua  aeebaalsa,  wbere  aysteas  are  dsseribad  only  by  probnbili- 
tiea  (or  wore  praaisaiy  by  wm  foaatlons  whose  squared  aaplitudas 
ara  probabilities*.    Tbara  is  tais  difference  howeTcr;  ia  quantum 
mechanics  area  tae  initial  state  aaaaot  be  preoiaely  deseribed 
due  to  tbs  aaeertaiaty  priaeiple.    Coajaeate  ▼eriablea  aaaaot 
both  be  measured  elaultaaeousiy  vita  exactness.    Za  tae  aysteas 
we  consider  Hera  there  are  asaaaed  to  be  no  dlffioulUes  of  this 
aeture— all  ooor  dins  tae  aaa  be  aiaaltaaeoualr  aad  preeiaely 
measured,    tais  eorrespoads  to  tae  differ  ease   la  tae  fundamental 
equation  from  that  of  qusataa  Aeehsaioe~Sebm,edlagoits  equation  is 
for  the  wave  fuaotion  *  ,  walla  tae  equation  considered  bare  deals 
directly  «itfc  tae  probability  density,    mas  the  present  work:  is 
adapted  to  "ifolar"  statistical  situations. 

Ihln  sort  of  analysis  any  *>*  expected  to  apply  to 
many  pr obi eat  where  the  actual  situation  Is  quits  explicated 
but  a  partial  theoretical  aaalysic  is  possible,    this  partial  an- 
alysis Is  used  for  the  determinate  part  of  tbs  c;u»tioa,  and 
the  other  complex  disturbing  effects  treated  statistically, 
each  situstions  may  occur  la  economics,  sociology,  history,  eta. 
as  veil  as  in  many  engineering  and  physios  J.  problems. 

G.  S.  Stlbits  la  a  series  of  meaoraada  bas  considered 
a  similar  problem  la  aonaeotioa  with  the  stability  of  a  periodically 
closed  servo  ays tea.    la  ale  case  the  phase  space  of  the  system 
oonslsted  of  a  sat  of  discrete  points,  and  uie  fundamental 
equation  is  a  difference  equation,    la  the  case  considered  here 
(which  was  suggested  by  Stlbits*  eora)  the  variables  are  continuous 
and  a  differential  equation  is  involved.  S 

Xa  a  Aataraiaate  *ja\*m  aita  aa  a  dlaaaaloaai  paaaa 
OMi,  nacaa  aotioa  la  iMtriM  bar  diffaroatial  asuatioaa,  *•  aa*a 

jgi  •  fYu\  **,  ....  **)      1  *  X#*  a  <D 

vbara  taa  x*  ara  ©oordLoate*  la  taa  paaaa  apaea  *ad    t   ia  tin*. 

If  aa  a  tart  wita  *  probability  diatributioa  of  poiat*  ia  paaoa  apaoa 

....  **,  t) 

giving  taa  probability  daaalty  ia  tsa  differ aatiai  rain**  «lta«at 

about  at1.  ....  a*  at  tiaa  t,   taia  dlatributfcm  cfaaa«f>a  adta  tin*. 

■  * 

lt»  utloa  la  4»»orll>»a  b»  tM  ftrUH  41ff«r«sU«i  •}u»Uoa 

or  ia  taaaor  aotatioa 


Taia  ia  oTidoat  If  »•  taia*  of  ?  aa  a  fluid  daaaity  uaoaa  Yaloaity 
flald  ia  f4. 

So*  auppoaa  taat  aa  t&*  raaraaeautiva  poiat  of  too 
ayataa  aovaa  about  taa  pftaao  apaaa  it  ia  ooatinaaily  aubjaat  to 
aaOl  dlatorb&aeaa,  walah  ar«  of  a  probability  ty?a«   tlaia  taa 
ayataa  taada  to  folio*  taa  aoluUoa  of  (1)  but  ie  aoatiaaally 
balac  dlaturbad  by  taa  probability  affeota,  walca  amy  bo  taouaat 
of  aa  aoaathlag  liJca  aolaaular  aoUiaioaa  of  taa  aurrouadia*  ama 

m   %  m 

oa  a  aorta*  partlelo.    *o  art  Ui«rtitt4  la  taa  lioltla*  •*»• 
abort  taa  dltturbiat;  tffoota  are  wp  rapid  tout  T*rj  aaall.  If 
we  eeeuao  that  taa  &ata*aeaee  1*  aa»o«taeottt  aaa  Isotx-oplt, 
tfela  eta  bt  rtpreeeate*  ay  as  afldltloaal  tara  la  taa  equation  of 
tao  aeet  flow  typo 


Za  tao  aort  gen*?el  oaoo  ear tela  dlreetloa*  007  00  jr  of  erred,  aad 
oortalo  reslona  may  aave  ereattr  partarbatloa  effaote«    taus  taere 
•111  generally  b«  *  esaU  ellpasld  of  probability  about  oaoa  point. 
aa4  o  oorroopoflcioa  poeltlve  aefiaite  ejiadrntio  for* 

defined  erer  toe  paa*e  apeee*   Tbli  form  deeerlbee  tao  Xoeal 
•tetletleal  perturbine  effeets,  for  eeea  point, 
tao  equation  tata  enauaee  tao  form 

Talt  partial  differential  eonetioa  «©wae  tao  flo*  of  probability 
la  tao  panee  tpeee,    Utb  oa  eaeeable  of  eyatene  dlatribated  at 
t  m  0  aoooraUa  to  F0(al) 

tao  attribution  at  a  la  tar  tlao   t^   la  tao  eolation  of  (1)  for 

Tao  equation  (1)  la  llaoar  aad  of  parabulia  typo  (la  t). 
In  taa  x*  it  le  elliptleel,  aiaea  a1^  la  fOaltlra  definite. 

m  %  m 

Tao  total  .robubiUtj  la  tU  jftaao  0j*«*  *«asia  o^staai,  for  if 
vt  lot 

/  (a1*  5^  ♦  *«    •  « 

tfco  latogral  boia*  ow  o  *  xffUi*aUy  Xar*o  oarfaoo,  ud  ^  t&o 
volt  awaalt 

Xf  a1*  to  aosltivo  oafiaito  «o4  oota  a1**  aa* 
ar«  ooatUwotui  la  tao  aaaao  aaaoo  turn  4iatri»«tioa   v  approaaM 
a  ual$*o  Halt  as  t  HMK   ma  Halt  la  alia«r  s«o  owr*a«*ot 

tao  pNfesalUty  JOtaroaUa*  to  Uf laltf  o*  a  «o*iatt«  Uaitiag  4i#- 
tritouoa  r*  alta  . 


ft*  aay  %• 

f*a  iiaitiaa  alatritottloa  am*t  aatlofjr  tao  olU#tioal 
ofuatloa  ottaiaoa  ay  oottla*  ||  •  0, 

To  nuom  tact  the  aiitrihution  epproaohea  a  Halt  let 
P1  and  ?g    ee  two  different  solution*  of  ID.    Titea  the  dif- 
ference  o,  -  ?A  -  P^   al«o  satiafia*  the  equation  aad  ^  la 
poaltive  la  oaa  region  B  and  negative  la  tae  raaaladar  at  tae 
apace.    Consider  tae  cuani-ity 

U  auat  deer  ease  for 

where  S  la  tae  surface  of  tae  reeioa  B  aad  T  la  tae  outward 
Telooity  of  tale  ear  face.    Since  Q  vanishes  an  the  surface,  tae 
aeooad  tern  la  aero,  aad  tae  first  la 

Toluae  iategrale  of  diYeraaaeea  aad  traaafora  aj  tae 


usual  theorems  lato  surface  integrale 


tae  aeooad  tera  age  la  vanishes  alace  Q  -  0  on  S.  la  tae  first 
term  «A  la  la  tae  direction  of  ^   a©  at  any  point  we  have 

<  0 

Tims  a aj  initial  distribution 

?a  «4  ?j  H  dearaaaia«. 
•BprMMM  t*»  MM  Xiait. 


•  I  I* 

It   «^    is  SeuiUiMOOS,  *ftt       tots  ft  <U»«aatHuiUyt 

PwiH  b#>  o&u  lienors,  sad  tfcs  ▼sotor  SUE  ftl— aa  i  t— tsassj » 

Ths  saouat  of  tiiia  di««oatiault/  Is  £U  «&  fcy 
ft1*        -  ?j)  •  -  If*  -  ?*)  » 

*frtr«  tht  b***sd  «a4  uafcsjrr  »d  l«n«r*  *****  ts>  ti»«  two  tide* 
of  t&«  dltesoiiiuUt/.  Tims 

SMMyiftlsai  Aft  Mm  *»a  i1£m  o#  s*sft  i  1  nana** ****** g>gj - 

Xft  tSM  sUpisst  Oft«  &l»«ASiS*%l  •*»*  wft  fcm 

If  wo  «tort  with  ft  «opiko*  of  prooaoilitr  ioaaUaoa 
at  oao  point,  ta«  I— tllato  aoaowiar  aaa  bo  aaaarlaoa  la  oittjOo 
tor  a*,    aoar  talt  poUt  wa  **r  ohaaao  a1*  aad  f1  to  bo  aoaotaat. 
Do»  to  tao  f1  tao  aolxo  otartt  «crln«  vita  a  ▼•lojUy/*,  9111141 
too  pro»«oUltr  tors  a1*  •pr«*de  it  out.    If  wo  oottt  wUtt«i 
fro*  af  to 

wo  aooo  - 

*  '  „  „.  - "' 

aod  too  •quatioa  boaoaoa 

taio  ia  tha  o^uatioa  far  aoat  flaw  la  aa  aaiootropla  Bodlua. 
Thai  ia  ftao  y*  aooraiooto  too  «»i*o  dlffaooa  out  lata  a  mwu&m 
al»%rlb*tlaa  *ita  qoaArotU  form  a**|  for  th«  firot  afcort  iatorroi 
of  tiaa 

waoro  A. «  it  tao  laroroa  fora  of  a1* 

feliaa  Toioauy  rial*  gaj  aom^aaaaoaa  at*u«ti«ai  .wta. 

Om  portioalo?  mm  of  la  tor  tot  1*  ttei  la  w&iaa 

is  tUo  opooo.   ?at  a  oao  a&aooslaaal  aaaaa  opoco,tfeo  a$uatlaa  U) 
taaa  aaoaaao  ta«  faxa 

A  coaoxal  solution  far  tola  o*§o  &«s  *soa  foa&u   It  a*?  *o  dosaria  aa 
*a  mxoi>a*   It  wae  laltlol  41*t*iteatl©a  i»  a   s   foactioa,  aa  taa 
sjrataa  (or  0^aeabJL«)  ia  fcaooo  to  aaaa  a  daflalta  talus  at  x  at 
t  *  0,  say  P$   taaa  at  \±  taa  diatribe  Uoa  is  aoraal*   ?ao  saatax 

aM^a  aa^MP  ^^^W^ft^^rd  IsV^^^aa^aV^^Oj^    ^9    s^-$  jjj^L^WW^ 

Taus  taa  attn  £ oaroaaas  alaaa,  taa  ium  suits  aa  taa  aystoa  aaaid 
follow  am  taa  atatiattaal  sff oata  aasaat*   Hm  tarlaaaa  a* 
iaoraaaoa  axyaaaatiaUy  to  a  Ualtia*  taiaa  a/a  aita  aalf  taa  tlaa 

to  ay  ova  taat  taia  la  taa  aalatlaa  it  la  oaly  aaaosaojy 
to  saastitats  la  taa  oqoatiea  (*) ,  k*  t  —a*      too  tiatrisatloa 
approaafcaa  a  normal  aao  saatarad  aa  ««ro  ultn  a*  «  a/a* 

M  •   |U  -  of*) 

«*  »  $  (1  •  O****) 

«iu  oa  oroitrarr  iaitioi  aiotritaUoo  ?aU)  too  oolottoo  ono  bo 
written  *•  ma  mte*r&l  ««lo«  U&*  aotooo  of  lu^iiUm  of  keo* 
flow  9robl«gt»« 

•  /  **m  * 

foe  eeoe  teaerol  rooolto  aoX4  la  toe  I  aiooaeioaal 
I*hi  wh*$i       it  i  ltft»»y  fere  *&d  e^  1a  eooetnat*    A  *OollEO# 
of  probability  eroo&eaa  iAte  o  oorool  Aletrleotleo*  toe  ooefte* 
folio* la*  tfit  dtlsrslMU  trejeetery  oad  toe  qooArotlO  for* 
vfeloh  tekeo  toe  jtliot  of  the  etaoaor4  eOriatloo  toMMNOooi  eat* 
oeoeatioUy  towt  o  eef  Ulte  limit.   *ae  eveloeties  of  too 
e  one  tea  to  io  obob  aero  eoopUeat**  1*  tale  eeoe  oeeew,  ftoe 
eeootlooe  for  too  fiaal  aietrloaUoe  oro  *i*eo  io  too  ejeeodis« 

Xt  la  t&t  oao  Alaoaoloaal  llaaa*  aaoo  «•  rtwt  alta  a 
aoxaal  4lat*l*atioa  aaatoroa  oa  ao*o  aita   a*  •  £ ,  tao  distriOuUe* 
hm  am  ttftttjr  alta  t&«  Xoxm.   Aa  io&iTi&ual  oyttoa  oxoaotaa 
•totlotioal  aoUaa  aooot  aoro  aaa  tao  oaaaablo  of  »jst*m*  prodoooo 
aa  oaaoaalo  of  tiao  oarloa.    Tail  mmiU*  aaa  b«  oooa  to  ao 
oaultaloat  to  taoraal  aoiao  waiea  aaa  oooa  p*»»ed  tirou^a  a   t Utor 
with  troa»f«r  aaaxaoterlotla 

loa&lag  to  a  po»or  opaotrua  for  ta*  aoloo 

To  aaow  tola,  tao  aatoaoxrolatioa  aa/  oa  o*icul*t«a,  Urotoaa 
vaooo  vaXuo  at  t  •  0  !•  P  aato  a  aoraai  distribution  oaatataA 

m  *  t^  ia 

Aiotriootioa  at  t  *  4  la  aoeraal  vita  a§  •  J  . 

aaA  tala  ia  too  autoo  jxrolatioa. 

too  power  apootnta  la  tao  laavia*  taraaafon  at  aula 



cystic  ^^^^^^^^oa  -^x  .-^n.. 


ft    •  JLfftf*}  ft  ♦    *(*)  F) 

#%  OX  9* 

mix)  t  0.   la  **•  »t4»4y  «t*t« 

*UJ   f*  ♦  *(x)  *  •  0 
twadBi    ?,  0    «*  x  «*»  ±     •  *  o 

*U)     1  fix)  p  •  o 

I  *  1 1*1 

A  1»  A«t«ralA*4  V  *&•  •o&AiUaa  |p 

ttmMi  it  is  *•*•*»•*?        /tlx)  ii 

fix)  »>  • 

f  (x)  •  x<  • 


•  IS* 

»t  obt&U  ft*  **•  ma  •tatloattry  oolutioa 

•V1*  -     '  . 

^  s-*M 

-  .« 

of  «x?oa«aUftl«  6««?«ftftl&£  lot»4  *  «. 

*&6       I*  wtwttsl 

fte  satisfy  dp  •  o  »•  tfc* 
this  v««>1ym 

•a*  *1m» 


By  R.  R.  Rlackman,  H.  W.  Rode,  and 
C.  E.  Shannon  ■ 

THE  problem  of  data  smoothing  in  fire  con-  distant  airplanes.  Suppose,  for  example,  that 
trol  arises  because  observations  of  target  in  observing  the  target's  position  we  make  two 
positions  are  never  completely  accurate.  If  the  errors  of  opposite  sign  and  a  second  apart,  of 
target  is  located  by  radar,  for  example,  we  may  25  yards  each.  Then  the  apparent  motion  of 
expect  errors  in  range  running  from  perhaps  the  airplane  is  in  error  by  50  yards  per  second. 
10  to  50  yards  in  typical  cases.  Angular  errors    Since  the  time  of  flight  of  an  antiaircraft  shell 

may  vary  from  perhaps  one  to  several  mils, 
corresponding  at  representative  ranges,  to 
yardage  errors  about  equal  to  those  mentioned 
for  range.  Similar  figures  might  be  cited  for 
the  errors  involved  in  optical  tracking  by  vari- 
ous devices.  Evidently  these  errors  in  observa- 
tion will  generate  corresponding  errors  in  the 
final  aiming  orders  delivered  by  the  fire-control 

A  data-smoothing  device  is  a  means  for  mini- 
mizing the  consequences  of  observational  er- 
rors by,  in  effect,  averaging  the  results  of  ob- 
servations taken  over  a  period  of  time.  The 
simplest  example  of  data  smoothing  is  fur- 
nished by  artillery  fire  at  a  fixed  land  target. 
Here  the  principal  parameter  is  the  range  to 
the  target.  While  individual  determinations  of 
the  range  may  be  somewhat  in  error,  a  reliable 

in  reaching  its  target  may  be  as  high  as  80 
seconds  or  more,  such  an  error  might  produce 
a  miss  of  the  order  of  1  mile.  It  is  clear  that 
in  any  comparable  situation  the  effect  of  ob- 
servational errors  in  determining  the  target 
rate  will  be  much  greater  than  the  position  er- 
ror alone  would  suggest,  and  the  function  of 
the  data-smoothing  network  in  averaging  the 
data  so  that  even  moderately  reliable  rates  can 
be  obtained  as  a  basis  for  prediction  becomes 
a  critically  important  one. 

Aside  from  magnifying  the  consequences  of 
small  errors  in  target  position,  the  motion  of 
the  target  complicates  the  data-smoothing 
problem  in  two  other  respects.  The  first  is  the 
fact  that  it  gives  us  only  a  brief  time  in  which 
to  obtain  suitable  firing  orders.  The  total  en- 
gagement is  likely  to  last  for  only  a  brief  time, 

estimate  can  ordinarily  be  obtained  by  taking    and  in  any  case  it  is  necessary  to  make  use  of 

the  simple  average  of  a  number  of  such  ob 
servations.  This  example,  however,  is  scarcely 
a  representative  one  for  problems  in  data 
smoothing  generally.  The  errors  involved  are 
small  and  the  averaging  process  is  an  elemen- 
tary one.  Moreover,  the  data-smoothing  proc- 
ess is  not  of  very  decisive  importance  in  any 

the  data  before  the  target  has  time  to  do  some- 
thing different.  Thus  the  averaging  process 
cannot  take  too  long.  The  second  complication 
results  from  the  fact  that  the  true  target  posi- 
tion is  an  unknown  function  of  time  rather 
than  a  mere  constant.  Thus  many  more  possi- 
bilities are  open  than  would  be  the  case  with 

case,  since  any  errors  which  may  exist  in  the    fixed  targets,  and  the  problem  of  averaging 

estimated  range  can  normally  be  wiped  out 
merely  by  observing  the  results  of  a  few  trial 

More  representative  problems  in  data 
smoothing  arise  when  we  deal  with  a  moving 
target.  In  this  case  errors  in  observational 
data  may  be  much  more  serious,  since  they 
determine  not  only  the  present  position  of  the 
target  but  also  the  rates  used  in  calculating 
how  much  the  target  will  move  during  the  time 
it  takes  the  projectile  to  reach  it.  An  illustra- 
tion is  furnished  by  antiaircraft  fire  against 

•  Bell  Telephone  Laboratories. 

to  remove  the  effects  of  small  errors  is  cor- 
respondingly more  elusive. 

The  intimate  relation  between  data  smooth- 
ing and  target  mobility  explains  why  the  data- 
smoothing  problem  is  relatively  new  in  war- 
fare. The  problem  emerged  as  a  serious  one 
only  recently,  with  the  introduction  of  new  and 
highly  mobile  military  devices.  The  airplane  is, 
of  course,  the  archetype  of  such  mobile  instru- 
ments, and  we  have  already  mentioned  the 
data-smoothing  problem  as  it  appears  in  anti- 
aircraft fire.  Since  the  relative  velocity  of  air- 
plane and  ground  is  the  same  whether  we  sta- 
tion ourselves  on  one  or  the  other,  however,  the 




mobility  of  the  airplane  produces  essentially 
the  same  sort  of  problem  in  the  design  of  bomb- 
sights  also.  Another  field  exists  in  plane-to- 
plane  gunnery.  Although  they  are  somewhat 
slower,  the  mobility  of  such  vehicles  as  tanks 
and  torpedo  boats  is  still  considerable  enough 
to  create  a  serious  problem  here  also.  Future 
examples  may  be  centered  largely  on  robot 
missiles.  It  is  interesting  to  notice  that  a 
guided  missile  may  present  a  problem  in  data 
smoothing  either  because  it  belongs  to  the 
enemy,  and  is  therefore  something  to  shoot  at, 
or  because  it  belongs  to  us,  and  requires 
smoothing  to  correct  errors  in  the  data  which 
it  uses  for  guidance.  The  tendency  to  higher 
and  higher  speeds  in  all  these  devices  must 
evidently  mean  that  fire  control  generally,  and 
data  smoothing  as  one  aspect  of  fire  control, 
must  become  more  and  more  important,  unless 
war  making  can  be  ended. 

Very  mobile  instruments  of  war,  such  as 
the  airplane,  began  to  make  their  appearance 
in  World  War  I,  but  there  was  insufficient  time 
during  that  war  to  make  much  progress  with 
the  fire-control  problems  which  such  instru- 
mentalities imply.  In  the  interval  between 
World  War  I  and  World  War  II,  however,  a 
considerable  number  of  fire-control  devices, 
such  as  bombsights  and  antiaircraft  compu- 
ters, were  developed.  The  principal  attention 
in  the  design  of  these  devices,  however,  was 
on  the  kinematical  aspects  of  the  situation. 
Although  a  number  of  them  included  fairly 
successful  methods  of  minimizing  the  effects  of 
observational  errors,b  it  seems  fair  to  say  that 
in  the  interval  between  the  two  wars  there 
was  no  general  appreciation  of  the  existence  of 
the  data-smoothing  problem  as  such. 

It  follows  that  the  theory  of  data  smoothing 
advanced  in  this  monograph  is  the  result  prin- 
cipally of  experience  gained  in  World  War  II. 
More  specifically,  it  is  the  product  of  the  ex- 

*  Most  of  these  solutions  depended  upon  the  use  of 
special  types  of  tracking  systems.  Examples  are  found 
in  the  use  of  regenerative  tracking  in  bombsights  and 
antiaircraft  computers  or  in  the  determination  of  rates 
from  a  precessing  gyroscope  or  an  aided  laying  mech- 
anism in  an  antiaircraft  tracking  head.  So  far  as  their 
effect  on  the  data-smoothing  characteristics  of  the 
overall  circuit  is  concerned,  these  devices  are  equiva- 
lent to  simple  types  of  smoothing  networks  inserted 
directly  in  the  prediction  system.  This  is  discussed  in 
more  detail  under  the  heading  "Exponential  Smooth- 
ing," Section  10.1. 

perience  of  the  authors  with  a  series  of  proj- 
ects, largely  sponsored  by  Division  7  of  NDRC, 
concerned  with  the  design  of  electrical  antiair- 
craft directors.  In  addition,  it  draws  largely 
on  the  results  of  a  number  of  other  investiga- 
tions, also  NDRC  sponsored.  The  possible  key 
importance  of  data  smoothing  in  the  design  of 
fire-control  systems  was  recognized  by  Division 
7  early  in  the  course  of  its  activities  and  the 
emphasis  placed  upon  it  in  a  number  cf  proj- 
ects led  to  the  accumulation  of  a  much  larger 
body  of  results  than  nJght  otherwise  have  been 

Data  smoothing  is  developed  here  in  terms 
of  concepts  familiar  in  communication  engi- 
neering. This  is  a  natural  approach  since  data 
smoothing  is  evidently  a  special  case  of  the 
transmission,  manipulation,  and  utilization  of 
intelligence.  The  other  principal,  and  perhaps 
still  more  fundamental,  approach  to  data 
smoothing  is  to  regard  it  as  a  problem  in  sta- 
tistics. This  is  the  line  followed  in  the  classic 
work1  by  Norbert  Wiener/  For  reasons  which 
are  brought  out  later,  Wiener's  theory  is  not 
used  in  the  present  monograph  as  a  basis  for 
the  actual  design  of  data-smoothing  networks. 
Because  of  its  fundamental  iaterest,  however, 
a  sketch  of  Wiener's  theory  is  included.  The 
authors'  apologies  are  due  for  any  mutilation 
to  the  theory  caused  by  the  attempt  to  simplify 
it  and  compress  it  into  a  brief  space. 

The  present  monograph  falls  roughly  into 
two  dissimilar  halves.  The  first  half,  consist- 
ing of  the  first  three  or  four  chapters,  includes 
a  discussion  of  the  general  theoretical  founda- 
tions of  the  data-smoothing  problem,  the  best 
established  ways  of  approaching  the  prob- 
lem, the  assumptions  they  involve,  and  the 
authors'  judgment  concerning  the  assumptions 
which  best  fit  the  tactical  facts.  In  this  part 
may  also  be  included  the  last  chapter,  which 
contains  a  fragmentary  discussion  of  alterna- 
tive data-smoothing  possibilities  lying  outside 
the  main  theoretical  framework  of  the  mono- 

The  rest  of  the  monograph  is  concerned  with 
the  technique  of  designing  specific  data-smooth- 
ing structures.  A  fairly  elaborate  and  detailed 
treatment  is  given  here,  in  the  belief  that  the 

•  Wiener  is  also  responsible  for  providing  tools  which 
permit  the  gap  between  the  statistical  and  communica- 
tion point*  of  view  to  be  bridged. 




problem  of  actually  realizing  a  suitable  data- 
smoothing  device  is,  in  some  ways  at  least, 
as  difficult  as  that  of  deciding  what  the  general 
properties  of  such  a  device  should  be.  The 
technique,  as  given,  draws  heavily  upon  the 
highly  developed  resources  of  electric  network 
theory.  For  this  reason  the  discussion  is 
couched  entirely  in  electrical  language,  al- 
though the  authors  realize,  of  course,  that 
equivalent  nonelectrical  solutions  may  exist. 
For  the  benefit  of  readers  who  may  not  be 
familiar  with  network  theory,  the  monograph 
includes  an  appendix  summarizing  the  prin- 
ciples most  needed  in  the  main  text. 

Two  further  remarks  may  be  helpful  in  un- 
derstanding the  monograph.  The  first  concerns 
the  relation  between  data  smoothing  and  the 
overall  problem  of  prediction  in  a  fire-control 
circuit.  These  two  are  coupled  together  in  the 
title  of  the  monograph,  and  it  is  clear  that  the 
connection  between  them  must  be  very  close, 
since,  as  we  saw  earlier,  small  irregularities  in 
input  data  are  likely  to  be  serious  only  as  they 
affect  the  extrapolation  used  to  determine  the 
future  position  of  a  moving  target.  In  the 
statistical  approach,  in  fact,  data  smoothing 
and  prediction  are  treated  as  a  single  problem 
and  a  single  device  performs  both  operations. 

In  the  attack  which  is  treated  at  greatest 
length  in  the  monograph  a  certain  distinction 
between  data  smoothing  and  prediction  can  be 
made.  To  simplify  the  exposition  as  much  as 
possible,  the  explicit  discussion  in  the  mono- 
graph is  directed  principally  at  data  smooth- 
ing. This,  however(  is  not  intended  to  suggest 
that  there  is  any  real  cleavage  between  the 
two  problems  or  that  the  analysis  as  developed 
in  the  monograph  does  not  also  bear,  by  impli- 
cation, upon  the  prediction  problem.  Any  the- 
ory of  data  smoothing  must  rest  ultimately 
upon  some  hypothesis  concerning  the  path  of 
the  target,  and  the  exact  statement  of  the  as- 
sumptions to  be  made  is  in  many  ways  the  most 
important  as  well  as  the  most  difficult  part  of 
the  problem.  The  same  assumptions,  however, 
are  also  involved  in  the  extrapolation  to  the 
future  position  of  the  target.  It  is  thus  impos- 
sible to  solve  the  data-smoothing  problem  with- 
out also  implying  what  the  general  nature  of 
the  prediction  process  will  be.  For  example, 
the  formulation  given  in  Chapter  9  amounts  to 

the  assumption  that  the  target  path  is  specified 
by  a  set  of  geometrical  parameters  correspond- 
ing to  components  of  velocity,  acceleration,  etc. 
The  data^smoothing  process  centers  about  the 
problem  of  obtaining  reliable  values  for  these 
parameters.  To  obtain  a  complete  prediction 
thereafter,  it  is  merely  necessary  to  multiply 
the  parameter  values  thus  obtained  by  suitable 
functions  of  time  of  flight  and  add  the  results 
to  the  present  position  of  the  target. 

The  other  general  remark  concerns  the  tacti- 
cal criteria  used  in  evaluating  the  performance 
of  a  data-smoothing  system.  This  turns  out  to 
be  one  of  the  most  important  aspects  of  the 
whole  field.  It  is  assumed  here  that  the  tactical 
situation  is  similar  to  that  of  antiaircraft  fire 
against  high-altitude  bombers  in  World  War 
II.  The  defense  can  be  regarded  as  successful  if 
only  a  fairly  small  fraction  of  the  targets  en- 
gaged are  destroyed.  On  the  other  hand,  the 
lethal  radius  of  the  antiaircraft  shell  is  so  small 
that  it  is  also  quite  difficult  to  score  a  kill. 
Under  these,  circumstances  we  are  interested 
only  in  increasing  the  number  of  very  well 
aimed  shots. 

When  we  combine  these  assumptions  with 
the  path  assumptions  described  in  Chapter  9 
we  are  led  to  the  data-smoothing  solution  for- 
mulated here,  in  preference  to  the  solution  ob- 
tained with  the  statistical  approach.  On  the 
other  hand,  we  might  equally  well  envisage  a 
situation  in  which  the  target  contained  an 
atomic  bomb  or  some  other  very  destructive 
agent,  so  that  it  becomes  very  important  to 
intercept  it,  while  the  lethal  radius  of  the  anti- 
aircraft missile  is  correspondingly  increased, 
so  that  great  accuracy  is  not  needed  for  a  kill. 
In  this  situation  our  interest  would  be  focused 
on  the  problem  of  minimizing  the  probability 
of  making  large  misses,  and  the  solution  fur- 
nished by  the  statistical  approach  would  be  ap- 
proximately the  best  obtainable."1 

"  In  fairness  to  the  statistical  solution  it  should  be 
pointed  out  that  it  is  also  the  beat  obtainable,  without 
regard  to  the  lethal  radius  of  the  shell,  if  we  replace 
the  path  assumptions  made  in  Chapter  9  by  a  "random 
phase"  assumption.  The  path  assumptions  in  Chapter 
9  are  almost  at  the  opposite  pole  from  a  random  phase 
assumption,  and  represent  a  deliberate  overstatement, 
made  in  order  to  illustrate  the  theoretical  situation  as 
clearly  as  possible. 


Chapter  7 


ONE  of  the  principal  difficulties  in  any 
treatment  of  data  smoothing  is  that  of 
stating  exactly  what  the  problem  is  and  what 
criteria  should  be  applied  in  judging  when  we 
have  a  satisfactory  solution.  It  is  consequently 
necessary  to  embark  upon  a  rather  extensive 
general  discussion  of  the  data-smoothing  prob- 
lem before  it  is  possible  to  consider  specific 
methods  of  designing  data-smoothing  struc- 
tures. This  preliminary  survey  will  occupy 
Chapters  7,  8,  and  9.  As  a  first  step  this  chap- 
ter will  describe  two  of  the  general  ways  in 
which  the  data-smoothing  problem  can  be  ap- 
proached mathematically.  The  formulation  of 
the  problem  which  is  finally  reached  in  Chap- 
ter 9  is  not  the  one  which  is  most  obviously 
suggested  by  these  approaches.  This,  however, 
does  not  lessen  their  value  in  characterizing 
the  problem  broadly. 



In  an  actual  fire-control  system  the  data- 
smoothing  problem  is  usually  made  fairly  spe- 
cific because  of  the  particular  geometry 
adopted  in  the  computer.  It  may  be  helpful 
to  have  some  particular  case  in  mind  as  a 
touchstone  in  interpreting  the  general  discus- 
sion. For  this  purpose  the  most  appropriate 
example  is  furnished  by  long  range  land-based 
antiaircraft  fire,  since  most  of  the  analysis 
described  in  this  monograph  was  developed 
originally  for  its  application  to  this  problem. 
It  is  usually  assumed  in  the  antiaircraft  prob- 
lem that  the  target  flies  in  a  straight  line  at 
constant  speed,  and  in  one  case  at  least  the 
computer  operates  by  converting  the  input  data 
into  Cartesian  coordinates  of  target  position 
and  differentiating  these  to  find  the  rates  of 
travel  in  the  several  Cartesian  directions. 
These  rates  form  the  basis  of  the  extrapolation. 

The  process  is  illustrated  in  Figure  1.  The 
input  coordinates  are  transformed  into  elec- 
trical voltages  proportional  to  xP,  y,.,  and  zr, 
the  Cartesian  coordinates  of  present  position, 

in  the  coordinate  converter  at  the  left  of  the 
diagram.  The  extrapolation  for  *  is  shown 
explicitly.  It  consists  essentially  in  differen- 
tiating to  find  the  x  component  of  target 
velocity,  multiplying  the  derivative  by  the  time 
of  flight  tf  and  adding  the  result  to  xP  to  find 



<  AZIU 




j  1 




»ZIU  / 

Figure  1.  Dat 
diction  circuit. 

xF,  the  predicted  future  value  of  x.  A  similar 
procedure  fixes  yr  and  zr.  After  the  addition 
of  certain  ballistic  corrections,  these  three  co- 
ordinates of  future  position  are  transformed 
into  gun  aiming  orders  in  the  coordinate  con- 
verter shown  at  the  right  of  the  drawing.  This 
last  unit  also  provides  the  time  of  flight  re- 
quired as  a  multiplier  in  the  extrapolation. 

The  small  irregularities  in  the  input  data 
caused  by  tracking  errors  are  greatly  magni- 
fied by  the  process  of  differentiation.  It  is  thus 
necessary  to  smooth  the  rates  considerably  if 
a  reliable  extrapolation  is  to  be  secured.  The 
data-smoothing  network  for  the  x  coordinate  is 
represented  by  JV,  in  Figure  1.  Since  the  Car- 
tesian velocity  components  are  theoretically 
constants  if  the  assumption  of  a  straight  line 
course  at  constant  speed  is  correct,  a  data- 
smoothing  network  in  this  computer  must  be 
essentially  an  averaging  device  which  gives 
an  appropriately  weighted  average  of  the  fluc- 
tuating instantaneous  rate  values  fed  to  it.  The 
problem  of  "smoothing  a  constant"  is  given 
special  attention  in  Chapter  10.  Aside  from  the 
particular  circuit  of  Figure  1,  we  may,  of 
course,  be  required  to  smooth  a  constant  when- 
ever the  prediction  is  based  upon  an  assumed 
geometrical  course  involving  one  or  more  pa- 
rameters which  are  isolated  in  the  circuit. 





In  addition  to  smoothing  the  rates  we  can, 
if  we  like,  attempt  to  smooth  the  irregularities 
in  present  position  also.  A  network  to  accom- 
plish this  purpose  is  indicated  by  the  broken 
line  structure  Na  in  Figure  1.  Of  course,  in 
dealing  with  the  present  position  we  are  no 
longer  smoothing  a  constant,  but  suitable  struc- 
tures can  be  obtained  by  methods  described 
later.  However,  the  effect  of  tracking  errors  in 
the  present  position  circuit  is  so  much  less  than 
it  is  in  the  rate  circuit  that  N2  can  generally 
be  omitted. 

Geometrical  assumptions  of  the  sort  implied 
in  Figure  1  are  helpful  in  visualizing  the  prob- 
lem, and  they  are  of  course  of  critical  impor- 
tance in  determining  what  the  final  data- 
smoothing  device  will  be.  It  is  important  not 
to  make  explicit  assumptions  of  this  kind  too 
early  in  the  formal  analysis,  however,  since 
the  meaning  of  such  assumptions  is  one  of  the 
aspects  of  the  general  problem  which  must  be 
investigated.  For  example,  it  is  apparent  that 
no  airplane  in  fact  flies  exactly  a  straight  line, 
nor  flies  a  straight  line  for  an  indefinite  period. 
In  detail,  the  solution  of  the  data-smoothing 
problem  depends  very  largely  on  how  we  treat 
these  departures  from  the  idealized  straight 
line  path.  For  the  present,  consequently,  it  will 
be  assumed  that  the  input  data  are  presented 
to  the  data-smoothing  and  predicting  devices 
in  terms  of  some  generalized  coordinates,  the 
nature  of  which  we  wjll  not  inquire  into  too 
closely.  A  given  coordinate  might,  for  example, 
be  a  velocity,  a  radius  of  curvature,  an  angle  of 
dive  or  climb,  or  any  other  quantity  which 
would  be  directly  useful  in  making  a  predic- 
tion, or  it  might  be  a  simple  position  coordi- 
nate such  as  an  azimuth  or  an  altitude. 

The  data-smoothing  and  predicting  opera- 
tion itself  is  assumed  to  be  performed  by  linear 
invariable  devices.  Aside  from  the  fact  that 
this  assumption  is,  of  course,  a  tremendously 
simplifying  one,  it  also  fits  the  data-smoothing 
problem  very  nicely,  as  the  problem  is  formu- 
lated in  this  chapter.  With  other  formulations, 
however,  it  appears  that  somewhat  better  re- 
sults may  be  obtainable  from  variable  devices 
or  devices  including  more  or  less  radical 
amounts  of  nonlinearity.  These  possibilities  are 
discussed  briefly  in  Chapter  14. 


Figure  1  illustrates  a  distinction  between 
two  possible  methods  of  looking  at  the  data- 
smoothing  problem  which  it  is  advisable  to 
establish  for  future  purposes.  In  describing  the 
x  system  in  Figure  1  we  laid  emphasis  on  the 
particular  networks  N,  and  Ns.  It  is  clear,  how- 
ever, that  the  complete  x  circuit  with  input  x, 
and  output  xF  is  a  network  having  overall 
transmission  properties  which  can  be  studied. 
Since  t,  will  normally  vary  with  time,  the  net- 
work is  not,  strictly  speaking,  an  invariable 
one,  but  the  changes  of  t,  are  ordinarily  too 
slow  to  make  this  an  essential  consideration. 

When  it  is  necessary  to  make  a  distinction 
between  these  points  of  view,  a  network  such 
as  Nx,  which  is  merely  an  element  in  the  pre- 
diction process,  will  be  called  a  data-smoothing 
structure.  An  overall  circuit,  providing  data 
smoothing  and  prediction  in  one  step,  will  be 
called  a  data-smoothing  and  prediction  net- 
work, or  simply  a  prediction  network.  Al- 
though these  points  of  view  have  been  illus- 
trated for  rectangular  coordinates,  they  obvi- 
ously apply  also  in  many  other  situations.  For 
example,  we  might  go  so  far  as  to  apply  the 
overall  point  of  view  to  a  complete  circuit  from 
input  azimuth,  say,  to  output  azimuth. 

Both  points  of  view  are  taken  from  time  to 
time  in  the  monograph.  When  possible,  how- 
ever, principal  attention  has  been  given  to  the 
limited  data-smoothing  problem.  This  tends  to 
simplify  the  discussion,  since  the  limited  prob- 
lem is  evidently  more  concrete  than  the  overall 
prediction  problem.  Moreover,  it  permits  us  to 
deal  lightly  with  such  questions  as  the  particu- 
lar choice  of  coordinates  in  which  the  smooth- 
ing operations  are  conducted,  since  it  assumes 
that  the  general  kinematical  framework  of  pre- 
diction has  already  been  decided  upon.  On  the 
other  hand,  the  overall  point  of  view  is  more 
effective  in  certain  situations,  and  it  is  the  only 
natural  one  in  the  statistical  treatment  de- 
scribed in  the  next  section. 


The  most  direct  and  perhaps  the  most  gen- 
eral approach  to  data  smoothing  consists  in  re- 




garding  it  as  a  problem  in  time  series.  This 
is  the  approach  used  by  Wiener  in  his  well- 
known  work.1  It  essentially  classifies  data 
smoothing  and  prediction  as  a  branch  of  statis- 
tics. The  input  data,  in  other  words,  are 
thought  of  as  constituting  a  series  in  time 
similar  to  weather  records,  stock  market  prices, 
production  statistics,  and  the  like.  The  well- 
developed  tools  of  statistics  for  the  interpreta- 
tion and  extrapolation  of  such  series  are  thus 
made  available  for  the  data-smoothing  and 
prediction  problem. 

To  formulate  the  problem  in  these  terms, 
let  fit)  represent  the  true  value  of  one  of  the 
coordinates  of  the  target  and  let  git)  repre- 
sent the  observational  error.  Then  fit)  and 
git)  are  both  time  series  in  the  sense  just 
defined.  The  set  of  all  such  functions  corre- 
sponding to  the  various  possible  target  courses 
and  tracking  errors  form  an  ensemble  of  time 
series  or  a  statistical  population.  One  can  im- 
agine that  a  large  number  of  particular  func- 
tions fit)  and  git)  have  been  recorded,  each 
with  a  frequency  proportional  to  its  actual 
frequency  of  occurrence.  Wiener  assumes  that 
they  are  stationary,  that  is,  that  the  statistical 
properties  of  the  ensemble  are  independent  of 
the  origin  of  time.  This,  of  course,  implies  that 
both  functions  exist  from  t  =  —  co  to  i  =  +  co  . 
We  will  sometimes  find  it  more  convenient  to 
make  the  assumption  that  the  two  functions 
vanish  after  some  fixed,  but  sufficiently  remote, 
points  on  the  positive  and  negative  real  t  axis.* 

The  input  signal  to  the  computer  is  of  course 
fit)  +  git).  If  we  assume  that  the  coordinate 
in  question  represents  a  position,  the  quantity 
we  wish  to  obtain  is  fit  +  t,),  where  t,  repre- 
sents the  prediction  time.  If  the  coordinate  is 
a  rate,  we  are  interested  in  an  average  value 
of  f(t)  over  the  prediction  interval.  This  com- 
plicates the  mathematics  somewhat,  but  does 
not  essentially  affect  the  situation. 

»  This  is  done  for  technical  mathematical  reasons.  We 
ahall  later  have  occasion  to  consider  the  Fourier  trans- 
forms of  f(t)  and  0(f),  and,  to  have  well-defined  trans- 
forms, the  integrals  of  the  squares  of  the  two  func- 
tions, from  t  -  -  co  to  t  =  +  <o  ,  should  be  finite.  This 
would  not  happen  under  the  "stationary"  assumption. 
Wiener  avoids  the  difficulty  by  introducing  what  he 
calls  a  generalized  harmonic  analysis,  but  this  method 
is  far  too  complicated  to  be  treated  in  a  brief  sketch 
like  the  present. 

We  shall  not,  of  course,  be  able  to  predict 
fit+tf)  perfectly  accurately.  Let  the  pre- 
dicted value  be  represented  by  f*it  +  t,).  In 
virtue  of  our  assumption  that  the  data- 
smoothing  and  prediction  circuit  is  to  be  a 
linear  invariable  network,  the  relation  between 
f*{t  •¥  t,)  and  the  total  input  signal  fit) 
+git)  can  be  written  as 

/*(<  +  </)  =  /  \M  +  gi<r))dK(a)  (1) 

where  dKia)  represents  the  effect  of  the  data- 
smoothing  and  prediction  circuit.  Comparison 
to  equations  (2)  and  (5)  of  Appendix  A  shows 
that  K  is,  in  fact,  the  indicial  admittance  of 
this  circuit.  The  particular  problem  to  be 
solved  is  of  course  that  of  finding  a  shape  for 
the  function  Ki<r)  which  will  make  +  t,) 
the  best  possible  estimate  of  fit  +  *f). 

The  fact  that  the  upper  limit  of  integration 
in  equation  (1)  is  taken  as  a  =  0  is  particu- 
larly to  be  noted.  It  corresponds  to  the  fact  that 
in  making  a  prediction  we  are  entitled  to  use 
only  the  input  data  which  has  accumulated  up 
to  the  prediction  instant.  This  restriction  will 
be  conspicuous  in  the  next  chapter,  where  the 
time-series  analysis  is  completed. 


The  principal  statistical  tool  used  in  study- 
ing equation  (1)  is  the  so-called  autocorrela- 
tion. Under  the  "stationary"  assumption  the 
autocorrelation  for  fit)  is  defined  by 

*i(t)  =  g$*hf-T  w*«w>*.  (2) 

We  can  obtain  a  normalized  autocorrelation, 
which  is  more  convenient  for  some  purposes, 
by  dividing  by  </>,(<>)•  This  gives 

C  f(l+r)fit)dt 
,     ,  \      <t>\ir)       ..  J-t 

*"(t)  =  *m  -  Ay. ~r  • « 

J  T  1/(0  J'  dt 

If  we  assume  that  fit)  in  fact  vanishes  for 
sufficiently  large  positive  or  negative  values  of 
t,  the  limit  sign  can  be  disregarded  and  e>lAr(T) 
becomes  simply 



0,v(r)  -  ffrj     fit  +T)f(t)dt  (4) 

(  /  (ty^dt  and  represents  the  total 
"energy"  in  the  time  series  f(t). 

Precisely  similar  expressions  can  be  set  up 
for  the  autocorrelation  <f>2ir)  or  <j>2K(r)  of  the 
observational  error  function  git).  In  a  gen- 
eral case  we  might  also  have  to  worry  about 
a  possible  cross  correlation  between  fit)  and 
g(t).  This  would  be  represented  by  a  cross- 
correlation  function  <£12(t),  obtained  by  inte- 
grating the  product  f(t  +  r)g(t).  In  practical 
fire  control,  however,  it  can  be  assumed  that 
the  correlation  between  target  course  and 
tracking  errors  is  small  enough  to  be  neglected. 

As  a  simple  example  of  the  calculation  of 
an  autocorrelation  we  may  assume  that  f(t)  = 
sin  wt.  Then 

1  CT 

tf>i  (t)  =  lim  ;r=,  I      sin  u(t  +  t)  sin  wt  •  dt 

=  lim  2?  /     ~  [cos  wt  —  cos  (2wt  +  wr)]d 

-  \  cos  «*,  (5) 

since  the  term  cos  (2a>t  +  an-)  will  contribute 
nothing  in  the  limit. 

The  maximum  value  of  (r)  in  (5)  is  found 
at  t  =  0.  This  is  to  be  expected,  since  ob- 
viously the  correlation  between  identical  val- 
ues of  the  function  is  the  best  possible.  What 
is  exceptional  about  the  present  result  is  the 
fact  that  <£,(t)  is  not  small  for  all  large  t's. 
This  is  fundamentally  a  consequence  of  the 
fact  that  we  chose  an  analytic  expression  for 
fit),  so  that  the  relation  between  two  values 
of  the  function  is  completely  determinate,  no 
matter  how  great  the  difference  between  their 
arguments.  In  a  more  representative  time 
series,  involving  a  certain  amount  of  statisti- 
cal uncertainty,  we  would  expect  £,(r)  to  ap- 
proach zero  as  t  increases,  reflecting  the  in- 
creasing importance  of  statistical  dispersion  as 
the  time  interval  becomes  greater. 

The  significance  of  the  autocorrelation  func- 
tion for  data  smoothing  and  prediction  is  ob- 
vious without  much  study.  Thus,  suppose  for 

simplicity  that  the  observational  error  #(0 
is  zero.  Then  the  autocorrelation  <f>,  (t)  is  the 
only  one  involved.  It  is  a  measure  of  the  ex- 
tent to  which  the  true  target  path  "hangs  to- 
gether" and  is  thus  predictable.  For  example, 
in  weather  forecasting  it  is  a  well-known  prin- 
ciple that  in  the  absence  of  any  other  infor- 
mation it  is  a  reasonably  good  bet  that  tomor- 
row's weather  will  be  like  today's  but  that  the 
reliability  of  such  a  prediction  diminishes  rap- 
idly if  we  attempt  to  go  beyond  two  or  three 
days.  This  would  correspond  to  an  autocorrela- 
tion function  which  is  fairly  large  in  the  neigh- 
borhood of  t  =  0,  but  diminishes  rapidly  to  zero 

In  a  similar  way  the  autocorrelation  of  the 
observational  error  git)  represents  the  extent 
to  which  this  error  hangs  together.  In  this 
case,  however,  a  high  correlation  is  exactly 
what  we  do  not  want.  Thus,  if  <£2(t)  vanishes 
rapidly  as  r  increases  from  zero,  closely  neigh- 
boring values  of  g  are  quite  uncorrelated,  and 
we  need  only  average  the  input  data  over  a 
short  interval  in  the  immediate  past  in  order 
to  have  most  of  the  observational  errors  aver- 
aged out.  If  4>2ir)  is  substantial  for  a  much 
longer  range,  on  the  other  hand,  a  much  longer 
averaging  period  is  necessary,  with  corre- 
spondingly greater  uncertainties  in  the  value 
obtained  for  fit). 


The  autocorrelation  function  does  not  in  it- 
self suffice,  to  determine  a  time  series  com- 
pletely. For  example,  it  is  easily  seen  that  the 

functions  sin  t  +  sin  2t  and  sin  t  +  cos  2t  have 
the  same  autocorrelation  in  spite  of  the  fact 
that  they  represent  waves  of  quite  different 
shape.  The  autocorrelation  function,  however, 
has  a  peculiar  importance  in  the  fact  that 
under  many  circumstances  it  is  the  only  piece 
of  information  about  the  time  series  which  we 
need  to  know. 

The  significance  of  the  autocorrelation  be- 
comes apparent  as  soon  as  we  investigate  the 
error  in  prediction.  In  many  mathematical  sit- 
uations involving  linear  systems  it  is  conven- 
ient to  deal  with  the  square  of  the  error  rather 
than  with  the  error  itself,  since  a  first  varia- 
tion in  the  error  squared  expression  gives  a 



linear  relationship  in  the  quantities  of  direct 
interest.  We  will  deal  with  the  square  of  the 
error  here.  If  E  represents  the  instantaneous 
error,  /*  (t  +  t,)  -  fit  +  t,) ,  the  mean  square 
error  over  a  long  period  of  time  is  evidently 


L  f* 

=  iim  —  r 

\r(t  +  t,)  -f(t  +  t,)}*dt 

[f(t  +  tf)]*dt 

-  lim  ^  f  f(t  +  t,)f*(t  +  t/)dt 

T  -»»  TJ_T 

+  lim  JL  I'*  ir(t  +  t,)\2dt.  (6) 

The  first  integral  in  equation  (6)  can  be 
evaluated  immediately.  From  (2)  it  is  <M0). 
To  evaluate  the  second  integral  replace  f*(t 
+  tf)  by  its  definition  from  (1).  This  gives 

-lim  lfTf{t  +  t,)dt  ["[fit  -  r) 

+  g(t  -  T)]dK(r)  =  -   lim  ]-  f  dK{r) 

(T  lf(t  +  t/)f(t-r)+f{t  +  t/)g(t-r)}dt 

if  we  reverse  the  order  of  integration.  Since 
we  assume  that  /  and  g  are  uncorrelated,  how- 
ever, the  product  f  (t  +  tf)g\t  -  r)  in  this  ex- 
pression makes  no  contribution  to  the  final  re- 
sult, and  by  replacing  the  integral  of  f(t  +  t,) 
f(t  —  t)  by  its  value  in  terms  of  4>l  the  expres- 
sion as  a  whole  can  be  written  as 

-if  <t>i(tf  +t)  dK(T). 

The  third  integral  in  (6)  can  be  simplified  in 
similar  fashion.  The  final  result  becomes 

&  -  4>i  (P)  -  2 

f  *i 

(tf  +  r)  dK(r) 


+J\k{t)  £  [0i(r  -  c)  +  Mr  ~  <r))dK(c)  . 

The  only  quantities  appearing  in  equation 
(7)  are  the  autocorrelations,  <£,  and  4>2,  of  the 
true  target  path  and  the  observational  error, 
and  the  function  K  which  specifies  the  data- 
unoothing  structure.  The  theoretical  problem 

with  which  we  are  confronted  is  evidently  that 
of  choosing  K  to  make  the  mean  square  error 
as  small  as  possible  for  any  given  $'s.  This 
problem  will  not  be  attacked  here,  although  a 
solution  obtained  by  a  somewhat  indirect 
method  is  presented  in  the  next  chapter.  The 
principal  reason  for  deriving  equation  (7)  is 
to  demonstrate  the  very  important  fact  that 
the  mean  square  error  depends  only  upon  the 
two  autocorrelations.  No  other  characteristics 
of  the  input  data  need  be  considered. 

It  will  be  recalled  that  the  mean  square  cri- 
terion was  introduced  originally  on  the  ground 
of  mathematical  convenience.  This  leaves  un- 
settled the  question  of  how  good  a  measure  of 
performance  for  a  data-smoothi;  g  network  it 
actually  is.  This  is  a  critical  question,  since 
upon  it  depends  the  validity  of  the  whole  ap- 
proach outlined  in  this  chapter.  A  priori,  the 
least  squares  criterion  is  a  dubious  one  since 
it  gives  principal  weight  to  large  errors.  In 
fire  control  we  are  normally  interested  only  in 
shots  which  are  close  enough  to  register  as  hits. 
If  a  shot  misses  it  makes  little  difference 
whether  the  miss  is  large  or  small.  The  merits 
of  the  least  squares  criterion  are  considered 
in  more  detail  in  Chapter  9,  where  the  conclu- 
sion is  reached  that  the  criterion  is  probably 
adequate  for  many  problems  but  needs  to  be 
supplemented  or  replaced  in  others,  including 
the  special  case  of  heavy  antiaircraft  fire  to 
which  particular  attention  is  given  in  this 
monograph.  Pending  the  discussion  in  Chapter 
9,  the  least  squares  criterion  will  be  assumed 
to  be  a  valid  one,  with  the  understanding  that 
the  analysis  is  intended  primarily  for  its  value 
in  contributing  to  the  general  understanding  of 
the  data-smoothing  problem  rather  than  as  a 
means  of  fixing  the  exact  proportions  of  an  op- 
timal smoothing  network. 


The  time-series  approach  to  data  smoothing 
is  closely  associated  with  another  which  at  first 
sight  may  seem  quite  different.  This  second 
approach  is  suggested  by  the  procedures  used 
in  communication  engineering.  Here  the  sig- 
nals, be  they  voice,  music,  television,  or  what 
not,  are  again  time  series.  Instead  of  dealing 



with  actual  signals  varying  in  a  more  or  less 
irregular  and  random  manner  with  time,  how- 
ever, it  is  customary  to  deal  with  their  equiva- 
lent steady-state  components  on  the  frequency 

The  analysis  of  data  smoothing  can  conven- 
iently be  approached  by  supposing  that  both 
the  true  path  of  the  target  and  the  effects  of 
tracking  errors  are  represented,  in  a  similar 
way,  by  their  frequency  spectra.  When  the 
situation  is  presented  in  this  way,  however, 
there  is  an  obvious  analogy  between  the  prob- 
lem of  smoothing  the  data  to  eliminate  or  re- 
duce the  effect  of  tracking  errors  and  the  prob- 
lem of  separating  a  signal  from  interfering 
noise  in  communication  systems.  We  may  take 
as  an  example  of  the  latter  the  transmission 
of  voice  or  music  by  ordinary  radio  over  fairly 
long  distances,  so  that  the  effects  of  static  in- 
terference are  appreciable.  In  such  a  system 
a  reasonable  separation  of  the  desired  signal 
from  the  static  can  be  obtained  by  means  of 
a  filter.  In  a  representative  situation  an  ap- 
propriate filter  might  transmit  frequencies  up 
to  perhaps  2,000  or  3,000  cycles  per  second,' 
while  rejecting  higher  frequencies. 

The  choice  of  any  specific  cutoff,  such  as 
2,000  or  3,000  c,  in  the  radio  system  depends 
upon  a  compromise  between  conflicting  consid- 
erations. Both  speech  or  music  and  static  nor- 
mally include  components  of  all  frequencies 
which  can  be  heard  by  the  human  ear.  Thus, 
suppressing  any  frequency  range  below  the 
limits  of  audibility,  at  perhaps  10,000  or  20,000 
c,  will  injure  the  signal  to  some  extent.  The 
intensity  of  the  signal  components,  however, 
diminishes  rapidly  above  2,000  or  3,000  c,  while 
the  energy  of  the  static  interference  is  more 
evenly  distributed  over  the  spectrum.  Thus,  by 
filtering  out  the  first  2,000  or  3,000  c,  we  can 
retain  most  of  the  signal  while  rejecting  most 
of  the  noise.  Naturally,  the  exact  dividing  line 
will  depend  upon  the  relative  levels  of  signal 
and  noise  power.  If  the  static  interference  is 
quite  weak,  for  example,  it  would  be  worth 

b  The  review  of  communication  theory  given  in  Ap- 
pendix A  shows  how  this  equivalence  is  established  by 
Fourier  or  Laplace  transform  methods. 

0  In  practice,  of  course,  the  filtering  would  probably 
take  place  in  the  radio-frequency  circuits,  but  it  is 
more  convenient  here  to  think  of  it  occurring  in  the 
demodulated  output. 

while  to  transmit  a  considerably  wider  band 
in  order  to  retain  a  more  nearly  perfect  signal. 
If  the  static  level  is  extremely  high,  on  the 
other  hand,  it  would  be  necessary  to  transmit  a 
still  narrower  band  at  the  cost  of  greater  mu- 
tilation of  the  signal. 

The  separation  of  the  true  path  of  a  target 
from  the  observed  path  including  tracking 
errors,  as  a  preliminary  to  prediction  of  the 
future  position  of  the  target,  presents  an  ap- 
proximately analogous  situation.  Again  the 
spectrum  of  the  "signal"  or  true  path  is  con- 
centrated principally  in  a  low-frequency  band, 
in  most  instances,  while  the  energy  of  tracking 
errors  or  "noise"  appears  principally  at  con- 
siderably higher  frequencies.  Thus  the  two  can 
be  separated  by  a  low-pass  filter.  The  separa- 
tion, however,  is  not  complete  since  some  com- 
ponents of  the  signal  spectrum  extend  into  the 
noise  region.  Thus  the  smoothing  process  must 
be  accompanied  by  some  mutilation  of  the  sig- 
nal, and  the  optimum  compromise  is  again 
attained  from  a  filter  which  transmits  a  rela- 
tively broad  band  when  the  tracking  errors  are 
of  low  intensity  and  a  much  narrower  band 
when  they  are  large. 

In  these  terms  the  most  obvious  difference 
between  the  data-smoothing  problem  and  the 
static  interference  problem  in  the  radio  system 
is  in  the  order  of  magnitude  of  the  frequencies 
involved.  They  are  roughly  10,000  times  smaller 
in  the  data-smoothing  case.  Thus,  the  typical 
signal  band  in  a  fire-control  system  may  cover 
a  few  tenths  of  a  cycle  per  second,  in  compari- 
son with  a  useful  band  of  2,000  or  3,000  c  in  a 
radio  system,  and  the  spectrum  of  tracking 
errors  or  noise,  with  representative  tracking 
devices,  includes  appreciable  components  up  to 
perhaps  2  or  3  c,  in  comparison  with  a  total 
effective  noise  band  in  the  radio  system  ex- 
tending to  the  limits  of  audibility  at  perhaps 
20,000  c. 

This  analogy  between  data  smoothing  and 
the  filtering  problems  which  appear  in  ordi- 
nary communication  systems  transmitting 
speech  or  music  must  of  course  not  be  carried 
too  far.  For  example,  previous  experience  with 
communication  filters  is  of  no  help  in  fixing  in 
detail  the  cutoff  in  attenuation  characteristic 
of  the  data-smoothing  filter,  since  in  communi- 
cation systems  these  choices  depend  on  psycho- 




logical  considerations  of  no  relevance  in  the  fire- 
control  problem.  Methods  of  determining  the 
best  rules  for  proportioning  a  data-smoothing 
filter,  therefore,  remain  to  be  determined.  We 
may  also  notice  that,  whereas  the  time-series 
approach  was  of  the  data-smoothing  and  pre- 
diction type,  the  filter  approach  emphasizes 
data  smoothing  only.  The  addition  of  the  pre- 
diction function  can  be  expected  to  change  ma- 
terially the  overall  characteristics  of  the  cir- 
cuit. Neither  of  these  remarks,  however,  robs 
the  filter  approach  of  its  value  as  a  simple  way 
of  thinking  about  the  problem  qualitatively. 



The  time-series  and  filter  methods  of  looking 
at  data  smoothing  are  related  to  one  another 
by  the  fact  that  the  autocorrelation  can  be  com- 
puted from  the  amplitude  spectrum,  or  vice 
versa,  by  Fourier  transform  means.  Consider, 
for  example,  the  Fourier  transform  of  the 
autocorrelation.  If  we  make  use  in  particular 
of  (4)  we  have 

0..v  (r)e  ~* 

 i-  f 



f(t  +  r)f(l)dt 


V2t  wt  X 



f{t)dt     /      f(l  +t)  e-^-dr 

•J  —  CD 

/(/  +  T)e-*"»+*J  rfr 



1  fm 

*'(«)  =  me-»*dt 


L.  f 

'2r  X. 

f(t  +  t)  e- •«('+')  dr 


F(w)  is  of  course  the  steady-state  spectrum 
of  the  signal  f(t).  Equation  (8)  thus  states 
that  the  Fourier  transform  of  <f>.s-  is  equal  to  a 
constant  times  the  square  of  the  amplitude  of 
the  steady-state  spectrum.  The  amplitude 
squared  spectrum  is,  however,  a  measure  of 

the  power  per  cycle.  The  relation  is  therefore 
equivalent  to  the  statement  that  the  autocorre- 
lation and  power  spectrum  are  Fourier  trans- 
forms of  each  other. 

Since  we  have  already  established  the  fact 
that  the  mean  square  error  in  prediction  de- 
pends only  on  the  autocorrelation,  this  analysis 
enables  us  to  conclude  immediately  that  the 
mean  square  error  can  also  be  calculated  from 
the  power  spectra  of  the  signal  and  noise.  It 
is  entirely  independent  of  the  phase  relations 
in  either  signal  or  noise.  The  phase  character- 
istics of  the  data-smoothing  network,  which 
operates  on  the  signal  after  a  specific  wave 
shape  has  been  established,  is,  of  course,  still 
of  consequence. 


Thus  far  the  material  which  has  been  pre- 
sented has  been  primarily  mathematical.  It 
has  consisted,  in  other  words,  of  outlines  of 
general  analytical  methods  which  are  available 
for  use  with  the  data-smoothing  problem.  It  is 
also  possible  to  approach  the  problem  in  a 
much  more  concrete  fashion.  It  is  obvious  that 
by  giving  thought  to  the  details  of  the  physical 
characteristics  of  tracking  units  and  targets, 
and  to  the  tactical  situations  with  which  we 
expect  to  deal,  it  should  be  possible  to  draw  a 
number  of  specific  conclusions  about  the  prob- 
lem as  a  whole.  In  a  general  theory  of  the  de- 
sign and  tactical  use  of  fire-control  apparatus 
such  an  approach  might  well  be  a  primary  one. 
It  is  scarcely  possible  to  follow  it  in  detail  in 
the  present  discussion.  The  following  para- 
graphs, however,  indicate  some  of  the  kinds  of 
considerations  which  can  be  brought  into  the 
problem  in  this  way.  It  will  be  seen  that  they 
tend  to  modify  the  strictly  mathematical  ap- 
proach, partly  by  qualifying  to  some  extent  the 
assumptions  made  in  the  mathematics,  and 
partly  by  tending  to  give  much  more  emphasis 
to  particular  aspects  of  the  problem  than  would 
appear  in  a  general  analytic  outline. 

Choice  of  ouukuiinatbb 

One  of  the  most  obvious  omissions  in  the 
general  analysis  thus  far  is  any  consideration 
of  the  choice  of  coordinates  in  which  the  data 




smoothing  is  to  take  place.  So  far  as  either 
the  statistical  or  filter  theory  is  concerned,  the 
coordinates  in  the  data  smoother  may  repre- 
sent either  the  original  tracking  data  or  any 
transformation  of  them.  The  fact  that  there  is 
actually  something  to  be  decided  here,  however, 
is  easily  seen  from  the  long-range  antiaircraft 
problem.  The  input  tracking  coordinates  for 
antiaircraft  would  normally  be  azimuth,  eleva- 
tion, and  slant  range.  If  the  airplane  flies  in  a 
straight  line  roughly  overhead,  the  general 
shape  of  the  azimuth  and  the  azimuth  rate  as 
functions  of  time  are  given  by  the  curves  in 
Figure  2.    The  curves  become  indefinitely 













Figure  2.  Azimuth  and  azimuth  rate  for  crossing 

steeper  as  the  target  path  approaches  the 
zenith,  and  it  will  be  seen  that  if  the  approach 
is  reasonably  close,  either  the  azimuth  or  the 
azimuth  rate  must  include  a  very  substantial 
amount  of  high-frequency  energy.  Since  the 
possibility  of  an  effective  separation  between 
the  signal  and  noise  in  the  filter  approach  de- 
pends upon  the  assumption  that  the  signal  com- 
ponents are  of  quite  low  frequency  with  respect 
to  the  noise,  the  presence  of  this  high-frequency 
energy  is  evidently  serious. 

When  the  target  describes  a  violently  evasive 
path  the  signal  spectrum  must  naturally  in- 
clude substantial  high-frequency  components, 
whatever  the  coordinate  system  may  be.  The 
high-frequency  components  indicated  in  Figure 
2,  however,  are  due  to  the  fact  that  the  target 
path  happens  to  pass  almost  over  the  director 
and  are  essentially  superimposed  upon  the 
high-frequency  components  which  reflect  the 
complexity  of  the  target  path  itself.  It  is  clear 

as  a  matter  of  principle  that  an  acceptable 
coordinate  system  for  data  smoothing  should 
not  introduce  frequency  components  which  de- 
pend upon  such  accidental  factors  as  the  loca- 
tion and  orientation  of  the  coordinate  system. 
The  rectangular  system  mentioned  in  connec- 
tion with  Figure  1  evidently  meets  this  condi- 
tion; so  also  does  the  "intrinsic"  system  de- 
scribed in  the  next  section. 

Physical  Limitations  of  Target  or  Tracker 

We  may  also  approach  the  data-smoothing 
question  by  a  consideration  of  the  motions 
which  are  physically  possible  either  in  the 
target  or  in  the  tracking  device.  In  the  heavy 
antiaircraft  problem,  for  example,  there  are 
substantial  physical  limitations  on  the  per- 
formance possibilities  of  present-day  aircraft 
We  can  be  quite  sure  that  any  motion  incom- 
patible with  these  limitations  is  necessarily  a 
tracking  error  and  can  be  removed  from  the 
incoming  data.  Naturally,  these  limitations 
must  appear  in  the  power  spectrum  of  the  sig- 
nal if  they  affect  the  mean  square  error  in  pre- 
diction, so  that  their  existence  in  no  way  dis- 
putes the  mathematical  framework  we  have 
set  up.  Consideration  of  the  physical  factors 
which  produce  them,  however,  may  permit 
them  to  be  established  more  easily  or  in  more 
clear-cut  fashion  than  would  be  possible  from 
a  statistical  examination  of  target  records 

The  limitations  on  airplane  performance 
can  be  stated  most  simply  when  the  motion  of 
the  airplane  is  expressed  in  so-called  intrinsic 
coordinates.  These  are  the  speed  of  the  air- 
plane, its  heading,  and  its  angle  of  dive  or 
climb.  The  maneuvering  possibilities  of  a  con- 
ventional airplane  in  these  three  directions  are 
quite  unequal.  By  banking  sharply  it  can 
maneuver  violently  to  the  right  and  left  and 
thus  make  quick  changes  in  heading.  The  pos- 
sibilities of  maneuvering  up  and  down,  how- 
ever, are  considerably  less,  particularly  for  a 
heavy  airplane,  where  there  are  usually  restric- 
tions on  the  maximum  angle  of  dive  or  climb 
which  can  be  assumed.  The  possibilities  of 
quickly  changing  the  speed  of  the  airplane, 
finally,  are  almost  nil.  The  thrust  of  an  air- 
plane propeller  is  so  small  in  comparison  with 



the  mass  of  the  airplane  that  only  small  accel- 
erations are  possible.*1 

Thus  the  optimum  filters  for  the  three  coor- 
dinates should  be  different.  The  one  for  speed 
can  have  a  very  narrow  band,  since  most  of 
the  signal  energy  for  this  coordinate  occurs  at 
very  low  frequencies.  The  optimum  band  for 
the  angle  of  dive  or  climb,  however,  should  be 
larger  (unless  it  turns  out  that  pilots  seldom 
make  use  of  maneuvering  possibilities  in  this 
direction)  and  the  one  for  the  heading  larger 
still.  In  this  ability  to  discriminate  among  the 
various  possible  directions  of  motion  the  in- 
trinsic coordinate  system  is  evidently  an  im- 
provement even  on  the  rectangular  system. 

Settling  Time 

Another  aspect  of  the  data-smoothing  prob- 
lem which  has  not  been  given  conspicuous  at- 
tention in  the  purely  mathematical  discussion 
is  the  fact  that  in  an  actual  tactical  situation 
questions  of  elapsed  time  are  of  great  impor- 
tance^ Engagements  usually  begin  suddenly 
and  last  for  a  comparatively  brief  period,  and 
it  is  important  to  find  a  data-smoothing  scheme 
which  provides  adequate  firing  data  as  quickly 
as  possible  after  an  engagement  starts.  A  situ- 
ation essentially  similar  to  the  beginning  of  an 
engagement  may  also  be  presented  whenever 
the  target  makes  a  sudden  change  of  course  or 
whenever  it  is  necessary  to  shift  from  one 
target  to  another  in  a  given  attacking  body. 
The  time  required  for  a  computer  to  give 
usable  output  data  after  any  of  these  events  is 
its  so-called  "settling  time,"  and  is  one  of  the 
most  important  parameters  of  any  data- 
smoothing  system.  It  is  possible  to  make  rough 
estimates  of  settling  time  by  indirect  means  in 
both  the  statistical  and  filter  theories  of  data 
smoothing,  but  no  explicit  consideration  of 
necessary  time  lapses  appears  in  either  theory. 
Evidently,  the  fundamental  fault  lies  with  the 
"stationary"  assumption. 

*  This  ignores  the  possibility  of  changing  the  speed 
through  gravitational  forces.  Since  these  possibilities 
are  linked  to  the  angle  of  dive  or  climb,  however,  they 
can  be  predicted.  This  has  actually  been  done  in  one 
experimental  computer. 

Effect  of  Human  Factors 

Aside  from  the  conditions  on  target  perform- 
ance which  arise  from  the  physical  character- 
istics of  the  target  itself,  there  are  others 
which  are  due  to  the  fact  that  the  target  is 
under  the  control  of  a  human  being  with  a 
definite  purpose.  The  language  of  the  statistical 
and  filter  methods  is  broad  enough  to  cover 
almost  any  situation.  It  tends  to  suggest,  how- 
ever, that  the  typical  target  paths  with  which 
we  deal  are  the  relatively  structureless  conse- 
quences of  random  physical  forces.  The  inter- 
vention of  purposive  human  behavior,  on  the 
other  hand,  tends  to  give  paths  which  fall  into 
more  or  less  definite  patterns.  A  simple  illus- 
tration is  furnished  by  the  argument  which  is 
frequently  offered  in  defense  of  the  straight 
line  assumption  in  dealing  with  antiaircraft 
defense  against  heavy  bombers.  It  is  contended 
that  while  the  targets  may  in  fact  engage  in 
substantial  evasive  maneuvers  during  most  of 
their  flight,  there  will  always  be  a  substantial 
period  during  the  bombing  run  in  which  they 
must  fly  very  straight  in  order  to  achieve 
bombing  accuracy.  On  the  basis  of  ordinary 
probability  we  would  of  course  expect  substan- 
tial straight  line  segments  quite  infrequently 
if  the  course  as  a  whole  shows  marked  disper- 
sion, and  the  intervention  of  the  human  pilot 
thus  provides  a  higher  degree  of  structure  than 
one  would  expect  in  a  corresponding  situation 
dominated  by  purely  natural  factors. 

A  broader  example  is  furnished  by  a  com- 
parison of  two  airplanes,  or  perhaps  more 
simply  of  two  boats,  one  of  which  is  under  the 
control  of  a  human  operator,  while  in  the  other 
the  steering  controls  are  lashed  in  a  neutral 
position.  Both  boats,  say,  may  be  expected  to 
experience  small  variations  of  course  due  to  the 
random  effects  of  wind  and  waves  upon  them. 
Over  a  short  period  of  time  the  observed  mo- 
tions of  the  two  boats  should  be  substantially 
identical.  In  the  case  of  the  boat  with  the 
lashed  helm  these  random  variations  will  tend 
to  accumulate,  so  that  it  is  possible  to  make  a 
reasonable  prediction  of  the  position  of  the 
boat  for  only  a  comparatively  short  distance 
in  the  future.  In  the  boat  with  the  human 
steersman,  on  the  other  hand,  we  may  expect 
corrections  to  be  applied  as  soon  as  the  random 
effects  become  large,  so  that  the  boat  tends  to 




retain  the  same  general  course  and  it  is  pos- 
sible to  predict  its  position  hours  or  even  days 
later  from  a  relatively  brief  observation. 

Neither  of  these  illustrations  is  inconsistent 
with  the  mathematical  framework  laid  down 

phase  relations,  even  if  the  special  features  in 
these  situations  may  be  the  controlling  factors 
in  determining  the  actual  probability  of  hit- 
ting. If  we  could  believe  the  bombing  run 
hypothesis,  for  example,  and  had  a  sufficiently 

earlier  in  the  chapter,  in  a  purely  theoretical    accurate  computer  and  gun,  we  could  expect 

sense.  For  example,  the  bombing  run  illustra- 
tion merely  states  that  because  of  the  presence 
of  the  human  operator  there  are  definite  phase 
relations  in  the  input  signal.  As  we  have  seen, 
such  relations  can  exist  without  affecting  com- 
putations based  on  mean  square  error.  The 

to  score  a  hit  in  every  engagement,  no  matter 
how  large  the  mean  square  error  might  be. 
More  generally,  it  is  probably  only  the  ten- 
dency of  targets  to  exhibit  "line  spectra"  which 
prevents  the  real  probability  of  a  kill,  small 
at  best,  from  becoming  microscopic.  It  is  nec- 

comparison  between  the  piloted  and  pilotless    essary  to  lay  special  emphasis  on  these  factors 

boats  can  be  interpreted  as  the  result  primarily 
of  differences  in  the  signal  power  spectrum. 
In  the  case  of  the  pilotless  boat,  for  example, 
the  signal  occupies  a  fairly  continuous  low- 
frequency  band,  while  in  the  case  of  the  piloted 
boat  it  must  be  regarded  as  concentrated  very 
closely  around  zero  frequency,  so  that  it  is  ap- 
proximately a  line  spectrum  superimposed  on 
a  continuous  one.  The  formal  mathematical 
theory  covers  also  such  cases  as  these. 

The  point  of  this  discussion,  however,  is  that 
the  mathematical  theory,  although  it  is  suf- 
ficiently general  in  a  formal  sense,  fails  to  dif- 
ferentiate between  such  situations  as  those 
just  described  and  the  more  shapeless  sort  which  the  mean  square  error  is  not  a  good 
involving  continuous  spectra  with   random    guide  to  the  actual  probability  of  scoring  a  hit. 

in  order  to  keep  the  overall  fire  control  picture 
in  perspective. 


Last  on  this  list  of  doubts  about  the  statisti- 
cal and  filter  theories,  we  may  mention  the 
least  squares  criterion  of  accuracy.  This  was 
discussed  before,  but  it  is  mentioned  again  as 
a  matter  of  emphasis,  and  because  of  its  close 
relation  with  the  factors  we  have  just  dis- 
cussed. For  example,  the  bombing  run  illustra- 
tion obviously  represents  one  situation  in 


Chapter  8 


Tt  was  shown  in  the  previous  chapter  that 
J-  both  the  statistical  and  filter  theory  ways  of 
looking  at  the  data-smoothing  problem  lead 
naturally  to  an  analysis  in  terms  of  the  power 
spectra  of  the  signal  and  noise.  The  phase  rela- 
tions are  not  important  as  long  as  we  accept 
the  mean  square  error  as  a  criterion  of  per- 
formance. The  inadequacies  of  the  mean  square 
criterion  will  finally  force  us  to  abandon  the 
steady-state  attack  in  favor  of  a  direct  analysis 
in  terms  of  the  wave  shapes  of  some  assumed 
signals.  The  steady-state  attack  is  nevertheless 
a  very  useful  one.  This  chapter  will  conse- 
quently continue  the  analysis  from  this  point 
of  view.  It  will  be  assumed  as  heretofore  that 
the  heavy  antiaircraft  problem  is  the  particular 
subject  of  interest. 

A  large  part  of  the  discussion  hinges  upon 
the  conditions  which  must  be  satisfied  by  the 
external  characteristics  of  an  electrical  net- 
work if  it  is  to  be  capable  of  physical  realiza- 
tion in  any  way  whatever.  These  limitations 
and  the  characteristics  which  may  be  postulated 
for  physical  networks  are  decisive  since,  in  the 
absence  of  such  restrictions,  no  limits  could  be 
set  upon  the  performance  which  might  be  ex- 
pected from  data-smoothing  and  predicting 
circuits.  The  facts  about  physically  realizable 
networks  which  we  shall  find  of  most  use  are 
summarized  below,  but  the  reader  not  familiar 
with  this  field  is  urged  to  read  also  the  account 
given  in  Sections  A.9  and  A.10,  Appendix  A.»* 
The  conditions  which  must  be  satisfied  by 
physically  realizable  networks  can  be  stated  in 
either  transient  or  steady-state  terms.  In  tran- 
sient terms  they  are  expressed  most  simply  by 
the  statement  that  the  response  of  a  physical 
network  to  an  impulsive  force  must  be  zero  up 
to  the  time  the  force  is  applied.  Thus  the  net- 
work has  no  power  to  predict  a  purely  arbi- 
trary event.  That  is,  it  has  no  way  of  foresee- 
ing whether  or  not  an  impulse  is  actually  going 
to  be  applied  to  it.  This  characteristic  of  physi- 
cal networks  is  taken  as  a  postulate. 

The  steady-state  limitations  on  physical  net- 


works  are  expressed  in  terms  of  their  attenua- 
tion and  phase  characteristics.  They  may  be 
derived  either  from  the  transient  specification 
or  from  the  postulate  that  a  physical  network 
must  be  stable.  There  are  no  important  limita- 
tions to  be  placed  upon  the  attenuation  and 
phase  characteristics  of  physical  networks  as 
long  as  we  deal  with  these  characteristics  "sepa- 
rately, but  there  are  very  severe  limitations  on 
the  phase  characteristic  which  can  be  associated 
with  any  given  attenuation  characteristic  or 
vice  versa.  In  particular,  when  the  attenuation 
characteristic  is  prescribed,  there  is  a  definite 
formula  for  calculating  the  unique  limiting 
phase  characteristic  with  which  it  may  be  asso- 
ciated.1" This  is  the  so-called  "minimum  phase" 
characteristic  because  any  other  physical  net- 
work having  the  postulated  attenuation  char- 
acteristic must  have  as  great  or  greater  phase 
shift  at  every  frequency.  As  we  shall  see  later, 
this  greater  phase  characteristic  would  corre- 
spond to  longer  lags  in  obtaining  usable  data, 
so  that  the  minimum  phase  characteristic  is 
the  optimum  for  a  data-smoothing  network. 
The  minimum  phase  characteristic  has  the  addi- 
tional important  property  that  not  only  does 
it  specify  the  transfer  admittance  of  a  physical 
network,  but  the  reciprocal  of  that  transfer 
admittance  can  also  be  realized  by  a  physical 

In  addition  to  this  principal  formula  for  the 
relation  between  attenuation  and  phase  there 
are  a  number  of  subsidiary  expressions  for 
special  aspects  of  the  problem.  One  in  partic- 
ular, relating  the  attenuation  to  the  behavior 
of  the  phase  characteristic  in  the  neighborhood 
of  zero  frequency,  is  used  extensively  in  this 

» In  limiting  cases,  such  as  may  be  found  when  the 
transfer  admittance  contains  zeros  or  poles  exactly  on 
the  real  frequency  axis,  the  "physical  structure"  may 
require  such  constituents  as  ideally  nondissipative  re- 
actances, perfect  amplifiers  with  unlimited  gain,  etc. 
This,  however,  is  of  no  consequence  for  the  present 
general  discussion. 







It  is  natural  to  begin  with  a  discussion  of  the 
spectrum  of  a  typical  target  path.  Unfortu- 
nately no  data  on  the  spectra  of  actual  meas- 
ured airplane  paths  exist,  and  the  theoretical 
assumptions  which  may  be  made  about  paths 
of  airplane  targets  are  best  discussed  in  the 
next  chapter.  This  section  consequently  will  be 
confined  to  rather  general  observations  about 
the  problem.  It  will  be  convenient  to  assume 
for  definiteness  that  the  quantities  to  be 
smoothed  are  the  velocity  components  in  Car- 
tesian coordinates. 

The  simplest  point  of  departure  is  furnished 
by  the  conventional  assumption  that  the  target 
flies  in  a  straight  line  at  constant  speed.  If  we 
could  construe  this  assumption  literally,  it 
would  mean  that  the  velocity  spectrum  in  rec- 
tangular coordinates  would  reduce  to  a  single 
line  at  zero  frequency.  In  practice,  of  course, 
the  spectrum  is  not  so  simple.  Even  in  the 
absence  of  deliberate  maneuvering,  the  target 
will  fly  a  slightly  curved  path  because  of 
"wander."  Moreover,  even  if  the  target  could 
fly  exactly  straight,  the  single  line  spectrum 
would  apply  only  to  a  straight  course  in- 
definitely continued.  The  spectrum  becomes 
more  complicated  if  we  consider  the  fact  that 
tracking  must  have  begun  at  some  finite  time 
in  the  past,  or  that  the  target  may  presumably 
change  occasionally  from  one  straight  line 
course  to  another. 

As  a  result  of  both  these  causes,  the  actual 
signal  spectrum  must  be  regarded  as  occupying 
a  band  bordering  on  zero  frequency.  The  distri- 
bution of  energy  in  detail  will,  of  course, 
depend  on  particular  circumstances.  The  band 
has  no  very  well  defined  upper  limit,  but  in 
most  cases  the  great  bulk,  at  least,  of  the 
energy  should  be  below,  say,  one-fourth  or  one- 
fifth  of  a  cycle  per  second.  For  example,  the 
natural  periods  of  a  heavy  airplane,  which  one 
would  expect  to  be  correlated  with  wander,  are 
below  this  limit."  This  limit  is  also  sufficient  to 
include  most  of  the  energy  resulting  from 
changes  in  course  occurring  as  frequently  as 
every  ten  or  twenty  seconds. 

In  general,  it  is  to  be  supposed  that  the  sig- 
nal spectrum  varies  as  where  n  may  be 
1,  2,  3,  depending  on  the  frequency  range.  This 
follows  from  general  considerations  of  the 

limitations  of  airplane  performance.  Thus,  if 
we  suppose  that  the  velocity  changes  discon- 
tinuous^ from  time  to  time,  it  follows  from 
general  Fourier  principles  that  the  amplitude 
must  vary  as  This  is  presumably  a  fair 
representation  of  the  actual  signal  spectrum  at 
low  frequencies.  At  moderate  frequencies,  how- 
ever, we  must  take  account  of  the  fact  that  the 
velocity  can  actually  be  changed  rapidly  but 
not  discontinuously,  and  we  consequently 
assume  that  the  amplitude  begins  to  vary  as 
ura.  Finally,  at  frequencies  of  the  order  of  per- 
haps one  cycle  per  second  one  must  take  ac- 
count of  the  fact  that  the  airplane  must  bank 
in  order  to  turn.  Since  it  takes  some  time  to  roll 
into  the  bank,  even  the  acceleration  in  the  lat- 
eral direction  cannot  be  discontinuous,  and 
consequently  the  amplitude  must  begin  to  vary 
as  c.r\  The  application  of  such  successive  limit- 
ing factors  in  constructing  a  complete  spec- 
trum is  described  in  more  detail  in  Section  A.8 
of  Appendix  A. 

One  other  general  condition  of  the  same  kind 
can  be  mentioned.  It  can  be  shown"  that  the 
integral  from  zero  to  infinity  of  log  H/l  +  if", 
where  H  is  the  power  spectrum,  is  very  impor- 
tant in  determining  the  properties  of  a  time 
series.  More  explicitly,  the  integral  converges 
if  the  series  is  essentially  statistical,  so  that  we 
cannot  foretell  the  future  from  the  past  with 
absolute  certainty.  This  of  course  is  the  case 
with  an  actual  signal  spectrum  in  a  fire-control 
problem.  It  implies  two  consequences;  first, 
that  H  cannot  be  zero  over  any  finite  band ;  and 
second,  that  in  the  neighborhood  of  infinite  fre- 
quency H  diminishes  slowly  enough  so  that 
| log  H\/o>->0. 


The  spectrum  of  tracking  errors  depends 
largely  upon  the  particular  sort  of  tracking 
equipment  involved.  Broadly  speaking,  optical 
tracking  equipment  (at  least  that  of  the  present 
or  recent  past)  tends  to  produce  tracking  errors 
not  only  of  small  amplitude,  but  also  of  low 
frequency,  so  that  they  are  hard  to  separate 
from  the  signal  spectrum.  Radar  equipment,  of 
the  present  time,  produces  higher-frequency 
errors.  Relatively  high-frequency  errors  are 
particularly  likely  to  be  found  in  very  stiff 
automatic  tracking  radars. 




A  number  of  examples  of  spectra  of  tracking 
errors  are  shown  in  Figures  1,  2,  and  3.  The 
spectra  are  given  directly  in  terms  of  range 
and  angle  errors.  To  make  them  comparable 
with  the  velocity  spectra  described  previously 

RMS  =30  YDS 
MEDIAN  =  0.022CPS 




E  4.10*- 

t       4       6       «  10 

Figure  1. 

,  12      14  IS 

Power  spectrum  of  range  errors  of  ex- 

it  would  be  necessary  to  multiply  all  amplitudes 
by  io.  In  addition,  it  would  of  course  also  be 
necessary  to  multiply  the  angle  rates  by  some 
suitable  range  in  order  to  compare  them  di- 
rectly with  the  yards-per-second  rates  we  have 
otherwise  considered. 

After  multiplication  by  <■>,  the  radar  spectra 
appear  to  be  about  flat  up  to  perhaps  one  cycle. 
Beyond  that  point  they  no  doubt  drop  off 
slowly,  although  the  accuracy  of  the  data  is  not 
sufficient  to  permit  the  situation  to  be  stated 
very  exactly. 



The  properties  of  the  signal  and  noise  as  we 
assume  them  here  can  be  conveniently 
expressed  by  reference  to  the  theory  of  so-called 

"random  noise"  functions.h  A  random  noise  can 
be  defined  as  a  function  which  has  a  definite 
amplitude  spectrum  but  completely  random 
phase  characteristics.  The  theory  of  such  func- 
tions is  well  developed  because  of  their  frequent 

RMS=  1.0  MIL 
MEDIAN  =0.53  CPS 

t  10 

A       6       8       10  12 

Figure  2.  Power  spectrum 
errors  of  experimental  radar. 

of  angular  height 

occurrence  in  physics.  It  is  probable  that 
neither  our  noise  functions  nor  our  signal  func- 
tions are,  strictly  speaking,  random  noise  ac- 
cording to  this  definition.  Thus,  there  are  proba- 
bly certain  definite  phase  relations  in  our  noise 
functions  because  of  the  physical  character- 
istics of  tracking  devices.  There  is  no  evidence, 
however,  that  any  such  relations  are  important 
enough  to  be  significant  in  the  data-smoothing 
problem,  so  that  we  are  fully  justified  in  iden- 
tifying them  with  random  noise  functions  as 
defined  above.  The  phase  relations  in  the  signal 
are  by  no  means  random.  As  long  as  we  con- 
sider only  the  mean  square  error,  however,  this 
factor  is  immaterial,  and  we  can  replace  the 
actual  signal  by  a  random  noise  function  with 
the  same  power  spectrum  for  purposes  of 

The  most  familiar  example  of  a  random 
noise  function  is  furnished  by  the  thermal 

"The  fact  that  we  also  refer  to  tracking  errors  as 

"noise"  is,  of  course,  merely  a  coincidence. 



voltage  across  a  resistance  R.  This  is  a  random 
noise  whose  spectrum  is  constant  up  to  very 
high  frequencies  with  the  value  P  ==  4\kTR  (k 
is  Boltzmann's  constant  and  T  the  absolute 
temperature) .  A  second  example  is  black  body 


RMS  =  1.4  MIL 
MEDIAN  =0.31  CPS 

  CO  10 






— J 


■»  - 



/  ^ 

0       2        4       6  1 

1       10      12       14  16 



Power  spectrum  of  trav 

radiation.  If  there  is  black  body  radiation  in  a 
space,  the  electric  (or  magnetic)  field  intensity 
at  a  point  is  a  random  noise  function  with 


P(D  = 

8*/3  1 

according  to  Planck's  law.  Random  noise  func- 
tions also  occur  in  the  Schottky  effect,  in 
Brownian  motion,  and  in  diffusion  and  heat 
flow  problems. 

For  purposes  of  analysis,  a  random  noise 
function  can  be  thought  of  as  a  function  made 
up  of  a  large  number  of  sinusoidal  components, 
which  are  very  closely  spaced  in  frequency 
and  whose  phases  are  completely  ran- 
dom.21 231  Thus  a  random  noise  can  be  repre- 
sented as 


2]  a-  cos  {unt  +  <(>n) 

n  -  1 

where  wn  —  n&f,  A/  being  the  frequency  differ- 
ence between  adjacent  components.  The  phase 

angles  <f>„  are  random  variables  which  are  in- 
dependent with  a  uniform  probability  distribu- 
tion from  0  to  2tt.  As  A/  decreases  the  functions 
in  this  ensemble  approach,  in  a  certain  sense, 
a  limiting  ensemble,  providing  the  amplitudes 
a„  are  adjusted  properly.  What  is  desired  is  to 
have  the  total  power  in  the  neighborhood  of 
each  frequency  approach  a  certain  limit  P(/), 
the  power  spectrum  at  that  frequency.  To  do 
this  we  make 

a.i  =  2tP(/)A/. 

In  the  limiting  ensemble  the  total  power  within 
a  small  frequency  range  A/  is  then  P(/)A/. 
The  function  PU)  completely  describes  the 
random  noise  ensemble  from  the  statistical 
point  of  view. 

A  particularly  important  special  case  is  that 
of  a  random  noise  with  a  constant  power  spec- 
trum. This  is  often  called  "flat"  or  "white" 
noise.  True  constancy  out  to  infinite  frequencies 
is  of  course  impossible  since  it  would  imply  an 
infinite  total  power  in  the  function.  The  idea 
is,  however,  still  useful  and  can  be  approxi- 
mated, as  with  resistance  noise,  by  having  a 
spectrum  which  is  constant  out  to  such  high 
frequencies  that  behavior  beyond  this  point  is 
of  no  importance  to  the  problem.  We  may  con- 
veniently think  of  flat  random  noise  as  being 
made  up  of  a  succession  of  weak  impulses  oc- 
curring frequently  but  at  random  times  with 
respect  to  one  another.  This  results  from  the 
fact  that  a  Fourier  analysis  of  a  single  impulse 
gives  a  flat  spectrum,  and  the  random  occur- 
rence of  many  of  them  produces  a  random  set 
of  phases.  In  a  physical  problem,  such  as  resis- 
tance noise  or  Brownian  motion,  these  im- 
pulses might  correspond  to  the  effects  of  indi- 
vidual small  particles.  Such  a  situation  is  of 
course  completely  chaotic.  If  the  impulses  are 
large  and  occur  relatively  infrequently,  the 
power  spectrum  is  still  flat,  though  the  func- 
tion is  no  longer  a  random  noise  function  as 
defined  here.  This  conception,  which  corre- 
sponds to  a  physical  situation  including  definite 
causative  elements,  will  be  revived  later  under 
the  name  of  the  elementary  pulse  method  of 

Random  noise  functions  have  a  number  of 
interesting  characteristics.  For  example,  they 
have  the  "ergodic  property."  This  means  that 



averaging  a  statistic  along  the  length  of  a  par- 
ticular random  function  give'  the  same  results 
as  averaging  the  same  statistic  over  an 
ensemble  of  functions  having  the  t  ime  power 
spectrum.  Each  function  is  typical  of  the 
ensemble.  To  be  more  precise  one  must  admit 
exceptions,  but  the  probability  of  an  exception 
is  zero.  For  example,  if  we  determine  the  frac- 
tion of  time  a  given  random  function  f(t)  has 
a  value  greater  than  some  constant  .4,  it  will 
be  equal  to  the  fraction  of  all  functions  in  the 
ensemble  which  are  greater  than  A  at  t  —  0 
(with  probability  1 ) . 

A  second  characteristic  of  random  noise 
functions  is  the  fact  that  they  frequently  lead 
to  Gaussian  or  normal  law  distributions.  For 
example,  the  aronlit'-Hes  of  a  random  noise 
function  are  di^tri^  <:._d  about  zero  in  accord- 
ance with  the  nvr^ttal  error  law.  Likewise,  the 
amplitudes  for  two  points  spaced  a  given  dis- 
tance apart  form  a  two-dimensional  normal 
error  law  distribution  when  we  consider  all 
possible  positions  of  the  first  point.  It  is  ap- 
parent that  if  the  signal  and  noise  are  actually 
random  functions  the  mean  square  error  is  as 
good  a  criterion  of  performance  as  any  other, 
since  it  completely  fixes  the  distribution  in  a 
normal  law  case. 

A  final  property  of  random  noise  functions 
is  the  fact  that  if  a  random  noise  is  passed 
through  a  filter  the  output  is  still  a  random 
noise.  If  the  power  spectrum  of  the  noise  is 
P(w)  and  the  transfer  characteristic  of  the 
filter  is  Y(iw),  the  output  spectrum  is 
P(a>)\Y(iw)  \\  In  particular,  if  we  take  the 
derivative  of  a  random  noise  with  spectrum 
P(w)  we  obtain  one  with  spectrum  w2P(w). 

This  last  property  of  random  noise  functions 
suggests  a  method  of  representing  them  which 
we  shall  find  useful  in  the  future.  The  method 
is  represented  by  Figure  4.  It  consists  of  a 





Figure  4.    Circuit  representation  of  random 

source  of  flat  noise  followed  by  a  shaping  filter 
to  give  the  desired  power  spectrum.  We  can 
easily  assign  to  the  filter  the  characteristics  of 
a  physically  realizable  structure  by  making  use 

of  the  relations  between  attenuation  and  phase 
mentioned  earlier  in  the  chapter.  It  is  merely 
necessary  to  convert  the  desired  power  spec- 
trum into  a  specification  of  the  attenuation 
characteristic  of  the  filter  and  then  use  the 
loss-phase  formula  to  compute  the  correspond- 
ing phase  shift.  It  will  be  assumed  that  this 
procedure  has  been  followed  when  we  make  use 
of  this  circuit  at  a  later  point. 

The  method  of  representing  random  func- 
tions thown  by  Figure  4  illustrates  graphically 
the  basis  of  the  prediction  schemes  described 
thus  far.  The  flat  noise  is  of  course  absolutely 
unpredictable.  The  history  of  the  function  up 
to  any  given  instant  gives  no  indication  of  its 
value  even  a  microsecond  later.  The  filter,  how- 
ever, forces  the  output  current  to  have  a  cer- 
tain structure  on  which  a  prediction  may  be 
based.  For  example,  if  the  filter  will  pass  only 
very  low  frequencies  it  is  clear  that  the  output 
can  change  very  little  in  a  microsecond. 


The  signal  and  noise  spectra  furnish  the  raw 
material  from  which  a  suitable  data-smoothing 
filter  can  be  deduced.  We  have  still  to  deter- 
mine, however,  the  exact  rule  for  choosing  the 
cutoff  and  attenuation  characteristic  of  the 
filter  from  these  spectra.  It  is  clear  that  previ- 
ous experience  with  signal-to-noise  problems 
in  systems  transmitting  voice-  or  music  is  no 
help,  since  the  filter  proportions  here  depend 
upon  psychological  considerations  of  no  rele- 
vance to  the  fire-control  problem.  For  example, 
the  interfering  effect  of  a  small  amount  of 
noise  is  much  greater  than  one  might  expect 
from  energy  considerations,  especially  in  in- 
tervals of  low  message  level,  and  it  is  con- 
sequently worth  while  to  maintain  a  relatively 
high  level  of  attenuation  in  the  noise  band. 
Conversely,  the  breadth  of  the  band  required 
for  the  message  depends  as  much  on  the  ability 
of  the  ear  to  reconstruct  a  complete  signal 
from  an  incomplete  one  as  it  does  upon  the 
actual  signal  power  spectrum. 

In  the  data-smoothing  case  a  suitable  crite- 
rion, dependent  upon  more  physical  considera- 
tions, can  be  obtained  by  minimizing  the  rms 
error  at  the  filter  output.  This  criterion  is 




easily  developed  from  the  power  spectrum  ap- 
proach, and  in  a  sense  it  is,  of  course,  the  only 
possible  one  as  long  as  we  follow  the  methods 
developed  thus  far. 

A  very  general  theory  for  the  minimization 
of  the  rms  error  of  the  filter  output  has  been 
developed  by  Wiener.1  Since  the  power  spec- 
trum approach  is  not  the  one  we  shall  eventu- 
ally follow,  however,  it  is  not  necessary  to  give 
this  analysis  in  detail.  The  nature  of  the  rela- 
tionships can  be  seen  from  an  elementary  corn- 
in  Figure  5  let  OA  be  a  unit 

square  error  is  a  minimum  if 


Figure  5.    Vector  relation  between  input  and  out- 
put of  data-smoothing  network. 

vector  representing  the  signal  component  at 
some  particular  frequency.  Let  the  amplitude 
ratio  between  the  input  and  output  of  the  data- 
smoothing  filter  be  x,  and  let  it  be  assumed  that 
the  system  is  phase  distortionless.  This  can 
always  be  accomplished,  at  the  cost  of  lag,  by 
phase  equalization.  Then  the  actual  signal 
output  can  be .  represented  by  OB,  where 
OB/OA  =  x.  Let  the  ratio  of  noise  power  to 
signal  power  at  this  frequency  be  k2.  Then  the 
output  noise  can  be  represented  by  the  vector 
BC,  at  some  arbitrary  phase  angle  6,  where 
BC/OA  =  kx. 

The  error  in  the  output  of  the  data-smooth- 
ing filter  is  evidently  represented  by  the  vector 
AC.  We  have 

(Acy  =  (CM)?i(i  -  x  -  kxcosey  +  (kxsmey] 

=  {OA)*  l  (1  -  is)  -  2*i(l  -  x)  cos  6  +  k'x') . 

Since  6  is  random  the  cross-product  term  in- 
volving cos  6  disappears  on  the  average.  (More 
generally,  it  disappears  as  long  as  the  noise  and 
signal  are  uncorrelated,  whether  or  not  their 
relative  phases  are  entirely  random.)  This 
leaves  the  mean  square  error  as 

Wan  -    (OA)l  [1    _  2Z  +  (1   +  *»)*»]  .  (1) 

x  — 


1  +  A-»      PN  +  Ps 

where  PB  and  Ps  are,  respectively,  the  signal 
and  noise  power  at  this  frequency.  Upon  sub- 
stituting this  result  in  equation  (1)  and  "re- 
membering that  (OAV  =  PB,  we  find  that  the 
minimum  mean  square  error  is 

PsPs  (2) 


Ps  +  Pi 

Equation  (2)  evidently  represents  the  sought- 
for  rule  for  the  filter  transmission  character- 
istic. It  is  illustrated  in  Figure  6,  where  PN 









w  1 


1  ^ 





i — - 



Figure  6.  Optimum  transmission  characteristic 
for  data  smoothing  assuming  signals  with  random 
noise  characteristics. 

Figure  7.  Si 
in  Figure  6. 

spectra  assumed 

and  Pt  have  been  chosen  respectively  as  the 
flat  curve  and  the  1/w*  curve  in  Figure  7.  In 
comparison  with  the  characteristics  of  typi- 
cal filters  in  communication  systems  it  is  quite 



rounded  with  a  relatively  slowly  falling  ampli- 
tude characteristic.  More  important  than  the 
detailed  rule  for  the  transmission  character- 
istic, however,  is  the  conclusion  that  the  shape 
of  the  characteristic  is  not  very  critical.  There 
is  very  little  loss  in  replacing  the  actual  curve 
in  Figure  6,  by  any  other  similar  character- 
istic. For  example,  we  might  validate  the 
assumption  of  zero  phase  distortion  by  making 
use  of  the  curve  which  automatically  gives  a 
linear  phase  shift.150 

A  more  extreme  illustration  is  furnished  by 
the  infinitely  selective  filter  characteristic,  with 
perfect  transmission  in  the  range  in  which  the 
signal  power  is  greater  than  the  noise  power, 
and  zero  transmission  elsewhere,  indicated  by 
the  broken  lines  in  Figure  6. 

It  follows  from  equation  (1)  that  in  the 
neighborhood  of  the  cutoff  point  <o0  the  mean 
square  error  for  this  filter  is  twice  that  of  the 
optimum  structure.  In  most  frequency  ranges, 
however,  the  penalty  is  far  less  than  this.  Since 
even  a  two-to-one  change  in  the  mean  square 
error  would  produce  no  tremendous  improve- 
ment in  the  effectiveness  of  fire,  it  is  clear  that 
the  result  to  which  we  are  led  by  this  method 
of  attack  is  by  no  means  critical. 


The  analysis  just  concluded  has  been  directed 
at  the  amplitude  characteristics  of  a  data- 
smoothing  filter.  By  virtue  of  the  relations  be- 
tween the  amplitude  and  phase  characteristics 
of  physical  networks  mentioned  earlier  in  the 
chapter,  however,  the  analysis  permits  us  to 





IN  »• 


u  a 






Figure  8.    Some  filter  attenuation  characteristics. 

give  at  least  a  partial  description  also  of  the 
phase  characteristics  of  the  filters.  This  is  an 
important  consideration  because  it  bears  upon 
the  question  of  time  delays  in  data-smoothing 
systems  which  was  mentioned  in  Chapter  7. 

The  general  nature  of  the  relationship  in 
simple  cases  is  illustrated  by  Figures  8  and  9. 









— — 

e  SHirr  in 





£  / 



Figure  9.  Corresponding  minimum  phase  char- 

Figure  8  shows  a  series  of  rising  attenuation 
characteristics  equivalent  to  rather  unselective 
falling  amplitude  characteristics  of  the  general 
type  shown  by  the  principal  curve  in  Figure  6. 
Figure  9  shows  the  corresponding  phase  char- 
acteristics computed  on  a  minimum  phase  shift 
basis.  In  Figure  8  the  central  attenuation  char- 
acteristic B  has  been  so  chosen  that  the  corre- 
sponding phase  characteristic  in  Figure  9  is 
exactly  a  straight  line  at  low  frequencies, 
where  the  transmitted  amplitudes  are  appreci- 
able. Curves  A  and  C  in  the  two  drawings  show 
slightly  different  cases,  but  it  is  clear  from 
the  figures  that  the  tendency  of  the  phase 
characteristics  to  approximate  linearity  is  still 

In  communication  engineering  a  phase  char- 
acteristic proportional  to  frequency  is  inter- 
preted as  indicating  a  delay  in  seconds  equal  to 
the  slope  dB/dw  of  the  phase  characteristic. 
This  relation  is  illustrated  most  simply  by  an 
ideal  line.  The  ideal  line  has  zero  attenuation 
combined  with  a  phase  shift  which  is  propor- 
tional to  frequency  and  which  at  any  given  fre- 
quency is  also  proportional  to  the  length  of  the 
line  in  question.  If  we  apply  any  arbitrary 
wave  to  the  line  it  is  propagated  down  the  line 
with  a  definite  velocity  and  unchanged  wave 
form.  The  time  required  for  the  wave  to  reach 



any  point  on  the  line  is  equal  to  the  slope  of  the 
phase  characteristic  to  that  point. 

In  a  structure  like  a  filter,  which  has  an  at- 
tenuation characteristic  varying  with  fre- 
quency, it  is  of  course  no  longer  possible  to 
transmit  an  arbitrarily  impressed  wave  with- 
out change  in  wave  shape.  Even  if  the  applied 
wave  is  merely  a  suddenly  applied  d-c  voltage 
or  single  frequency  sinusoid,  there  is  a  tran- 
sient period  before  the  response  approximates 
its  final  value.  In  structures  having  a  substan- 
tially linear  phase  characteristic  over  any  fre- 
quency range  in  which  they  exhibit  an  appreci- 
able amplitude  response,  however,  this  total 
transient  characteristic  falls  naturally  into  two 
parts.  The  first  is  a  waiting  period  equal  to  the 
slope  of  the  phase  characteristic,  during  which 
the  response  is  very  small,  whereas  the  second 
is  a  true  transient  period  in  which  the  response 
is  substantial  but  does  not  resemble  the  final 
steady-state  response.  This  is  illustrated  by 
Figure  10  which  shows  the  voltage  at  the  fifth 





10        15  20 


Figure  10.    Voltage  at  fifth  section  of  conventional 
low-pass  filter  in  response  to  unit  d-c  voltage. 

section  of  a  conventional  low-pass  filter  in 
response  to  a  d-c  voltage  applied  at  zero  time 
at  the  input  terminals.1"  The  end  of  the  waiting 
period,  as  deduced  from  the  slope  of  the  phase 
characteristic,  is  indicated  by  the  broken  line. 

Delays  of  the  sort  just  illustrated  must  be 
expected  in  a  data-smoothing  filter  whenever 
the  nature  of  the  signal  is  changed.  This  hap- 
pens at  the  beginning  of  tracking,  in  changing 
from  one  target  to  another,  or  even  in  follow- 
ing a  single  target  when  the  target  makes  an 
abrupt  change  in  course.  Since  usable  data  in 
a  fire-control  system  must  be  quite  accurate, 
the  delay  to  be  allowed  for  must  include  both 
the  initial  waiting  period  and  the  subsequent 

transient  period  until  the  transient  ripples 
have  almost  vanished.  A  considerable  part  of 
the  art  of  desi0  ung  data-smoothing  networks 
consists  in  controlling  the  design  so  that  these 
final  transient  ripples  decay  relatively  rapidly. 
We  are  not  yet  ready  to  discuss  this  problem: 
It  will  turn  out,  however,  that  the  minimum 
interval  which  can  be  assigned  to  the  "true 
transient"  period  is  about  equal  to  that  which 
must  be  allowed  for  the  initial  waiting  period/ 
Thus  the  slope  of  th?  phase  characteristic  can 
be  used  as  an  index  of  the  lags  which  must  be 
expected  in  data  smoothing  merely  by  doubling 
the  delay  to  which  the  slope  would  normally  be 
said  to  correspond. 

When  we  use  the  phase  slope  as  an  index  of 
delay  it  becomes  immediately  apparent  that 
lags  are  the  necessary  consequence  of  smooth- 
ing in  physical  circuits.  This  is  easily  seen  by- 
reference  to  the  relations  which  must  exist  be- 
tween attenuation  and  phase  characteristics  in 
physical  structures.  An  example  is  provided  by 
the  formula15*1 


where  A  is  attenuation,  .4,,  is  the  attenuation 
at  zero  frequency,  and  B  is  phase  shift.  In  other 
words,  the  delay  (measured  by  the  slope  of  the 
phase  characteristic  at  zero  frequency)  is  pro- 
portional to  the  integral  of  the  attenuation  on 
an  inverse  frequency  scale  when  the  attenua- 
tion at  zero  frequency  is  taken  a&.the  reference. 
The  equation  thus  states  that  the  system  will 
exhibit  a  lagging  response  as  long  as  there  is  a 
net  high-frequency  attenuation.  As  a  numerical 
illustration,  let  it  be  supposed  that  A  is  zero 
below  4»  —  1.  This  corresponds  to  the  estimate 
made  earlier  in  the  chapter  that  the  input  sig- 
nal components  in  antiaircraft  work  lie  roughly 
in  the  band  below  about  0.1  or  0.2  cycle  per  sec- 
ond. Let  it  be  supposed  also  that  A  at  higher 
frequencies  is  equal  to  3  nepers,  corresponding 
to  an  average  amplitude  reduction  of  about  20 

c  This  is  not  intended  to  imply  that  the  distinction 
between  the  initial  waiting  period  and  the  "true  tran- 
sient" period  is  quite  as  sharp  as  it  is  in  Figure  10.  The 
selectivity  in  a  data-smoothing  filter  is  usually  not 
great  enough  to  justify  the  assumption  that  components 
beyond  the  linear  phase  region  are  of  negligible  im- 



to  1.  Then  dB/d*  at  the  origin  is  given  from 
equation  (3)  as  S/n  seconds,  and  in  accordance 
with  the  rule  just  enunciated  the  minimum  de- 
lay to  be  expected  from  such  a  structure  in  a 
data-smoothing  application  would  consequently 
be  12/ir  seconds. 

Aside  from  such  specific  quantitative  rela- 
tions equation  (3)  is  useful  as  a  basis  for  a 
number  of  important  qualitative  conclusions. 
One,  for  example,  is  the  fact  that  although  a 
lag  is  a  necessary  concomitant  of  any  system 
showing  a  high-frequency  attenuation,  the 
amount  of  the  lag  depends  greatly  upon  the 
portion  of  the  frequency  spectrum  in  which 
the  attenuation  is  found.  Since  the  integral  is 
taken  on  an  inverse  frequency  scale,  a  small 
attenuation  at  low  frequencies  is  much  more 
important  than  a  considerably  greater  attenua- 
tion further  out  in  the  spectrum.  This  points  to 
the  desirability  of  designing  tracking  instru- 
ments which  generate  principally  high-fre- 
quency noise,  even  if  the  amplitude  of  the  noise 
is  somewhat  increased  thereby.  We  may  also 
notice  that  since  the  attenuation  is  a  logarith- 
mic function  of  amplitude  an  initial  moderate 
reduction  in  the  amplitude  of  disturbing  noise 
may  be  much  less  expensive  in  lag  than  subse- 
quent attempts  at  further  reduction.  For  ex- 
ample, an  amplitude  reduction  from  100  to  10 
per  cent  over  a  given  portion  of  the  frequency 
spectrum  produces  no  more  lag  than  a  subse- 
quent reduction  from  10  to  1  per  cent. 


In  Chapter  7  we  distinguished  between  what 
we  called  the  simple  data-smoothing  problem 
and  the  data-smoothing  and  prediction  prob- 
lem. The  simple  problem,  with  which  this  re- 
port is  chiefly  concerned,  is  the  one  which  has 
been  given  principal  attention  thus  far.  On 
account  of  its  broad  interest,  however,  it  seems 
worth  while  to  include  also  a  brief  statement 
of  Wiener's  solution  of  the  general  problem. 
The  method  of  development  used  here  is  intui- 
tive and  nonrigorous  in  comparison  with 
Wiener's  own  development,  but  it  permits  the 
principal  relations  to  be  established  by  very 
elementary  means. 

It  is  convenient  to  consider  first  the  zero 
noise  case.  The  past  history  of  the  signal,  then, 

is  known  perfectly,  and  the  existence  of  a 
prediction  problem  depends  entirely  upon  the 
fact  that  since  the  signal  is  assumed  to  be  sta- 
tistical in  character,  its  future  is  not  com- 
pletely determined  from  its  past.  The  situation 
can  be  thought  of  in  the  terms  suggested  by 
Figure  11.  The  actual  signal  output  appears  at 








Figure  11.  Schematic  representation  of  Wiener's 
prediction  theory  when  there  is  no  noise. 

P,.  In  accordance  with  the  discussion  earlier 
in  the  chapter,  we  imagine  this  signal  to  be 
generated  by  passing  flat  noise  through  the 
shaping  network  Nx.  The  transfer  admittance 
Yx(iio)  of  Nt  is  determined  from  the  power 
spectrum  of  the  signal  by  the  procedure  out- 
lined earlier  and  is  a  minimum  phase  shift  char- 
acteristic. It  will  be  recalled  that  minimum 
phase  shift  transfer  admittances  have  the  im- 
portant property  that  their  reciprocals  are  also 
the  transfer  admittances  of  physically  realiz- 
able networks. 

From  F,  we  can  readily  compute  the  tran- 
sient response  characteristic  of  N\.  We  shall 
assume  for  illustrative  purposes  that  the  im- 
pulsive admittance  of  Nl  takes  the  special 
shape  shown  by  Figure  12. 

Figure  12.  Assumed  impulsive  admittance  of 
shaping  filter. 

The  flat  noise  is  thought  of  as  consisting  of 
a  large  number  of  elementary  impulses  with 
random  amplitudes  and  occurring  at  random 
times.  For  the  purposes  of  this  analysis,  how- 
ever, it  is  sufficient  to  consider  only  the  three 
unit  impulses  shown  in  Figure  13.  Impulse  B 
is  supposed  to  occur  at  the  instant  at  which 



the  prediction  is  to  be  made,  A  occurs  two  sec- 
onds in  the  past,  and  C,  one  second  in  the 
future.  The  response  of  AT,  to  these  three  im- 
pulses will  evidently  be  three  curves  of  the 
sort  given  by  Figure  12,  suitably  displaced  in 
time  as  shown  by  Figure  14. 



-2       -I  0 

Figure  13.  Impulses  giving  rise  to  applied  signal 
through  shaping  filter. 

The  desired  output  of  the  predicting  network 
is  the  curve  of  Figure  14  advanced  by  the  pre- 
diction time,  which  we  can  assume,  for  illus- 
tration, to  be  two  seconds.  It  may  be  assumed 

SUM  \ 



1  , 

a  •  I 

»  " 


9  1 

"Hf  \r 

/\  '* 
/  V  * 





t  \ 

%  \ 
*  \ 




.  * 








0          2         4  t 


Figure  14.   Applied  signal  at  P„ 

for  the  sake  of  preliminary  analysis  that  the 
input  of  the  predicting  network  is  the  three 
original  impulses  of  Figure  13.  The  terminal 

Pt  at  which  they  are  supi 

appear  is  of 

course  a  purely  fictitious  one  and  is  not  acces- 
sible to  us  physically.  We  can,  however,  con- 
struct the  equivalent  terminal  P'3  by  imposing 
the  actual  signal  from  terminal  Px  on  the  net- 
work N2,  whose  transfer  admittance  is  the 
reciprocal  of  that  of 

Let  the  predicting  network  connected  to  ter- 
minal Fa  be  represented  by  N,.  Obviously  a 
perfect  prediction  would  be  secured  if  Nt  could 
be  assigned  the  impulsive  admittance  shown  in 
Figure  15,  that  is,  an  impulsive 



2  ( 

»  ; 

>  A 

6  « 

Figure  15.  Iueal  impulsive  a 
tion  network  N,  in  Figure  11. 

equal  to  the  impulsive  admittance  of  the  origi- 
nal network  but  moved  forward  by  the  2-second 
prediction  time.  Then  all  the  constituent  curves 
and  the  sum  curve  in  Figure  14  would  similarly 
be  moved  forward.  Of  course  we  cannot  assign 
ATS  an  impulsive  admittance  which  is  different 
from  zero  at  negative  times  without  postulat- 
ing a  nonphysical  network.  It  is,  however,  per- 
fectly possible  to  define  N,  from  the  portion  of 
the  impulsive  admittance  characteristic  at  posi- 
tive times,  with  the  remainder  set  equal  to 
zero.  This  gives  an  impulsive  admittance  of 
the  type  shown  by  Figure  16.  When  energized 
by  the  three  unitary  impulses,  it  gives  the 
result  shown  in  Figure  17.  The  contributions 
of  impulses  A  and  B  are  not  affected  by  the 
absence  of  a  negative  time  portion  of  the  im- 
pulsive admittance,  but  the  contribution  of  im- 
pulse C  is  lost. 

To  formulate  a  physical  prediction  network 

2         0  < 

\  A 

Figure  16.  Realizable  portion  of  required  im- 
pulsive admittance. 




we  have  merely  to  find  by  conventional  meth- 
ods the  steady-state  admittance  Y,  corre- 
sponding to  the  impulsive  admittance  of  Figure 
16.  The  two  networks  AT,  and  A7;1  may  then  be 

in  the  manner  shown  by  Figure  18.  The  first 
source  of  flat  noise,  together  with  the  shaping 
network  N,„  is  the  combination  we  have  already 
used  to  represent  the  signal  in  the  noise-free 

-2         0         2         4         6  8 

Figure  17.    Response  of  realizable  prediction  net- 

combined  to  give  a  single  structure  with  the 
transfer  admittance  Y,Y:  =  YJY,  which  will 
give  the  complete  prediction  when  energized  by 
the  actual  signal. 

The  mean  square  error  in  prediction  is 
easily  determined  from  the  fact  that  the  con- 
tributions of  all  impulses  of  the  sort  repre- 
sented by  C,  occurring  in  the  prediction  in- 
terval, are  lost.  Since  impulses  in  the  flat  noise 
source  occur  at  random  times  the  mean  square 

error  is  proportional 


W-(T)dT,  where  a 

is  the  prediction  time  and  W  is  the  impulsive 
admittance  of  Figure  16.  Since  the  flat  noise 
impulses  occurring  after  the  time  at  which  the 
prediction  is  made  are  surely  unpredictable,  it 
is  clear  that  this  error  is  the  least  we  could 
expect  any  physical  prediction  network  to  have 

When  the  input  data  includes  noise  as  well  as 
the  signal  it  is  natural  to  think  of  the  situation 





Figure  18.   Circuit  representation  of  random  func- 
tions representing  signal  and  noise. 

case.  The  addition  of  noise  is  represented  by 
the  second  independent  source  of  flat  noise  with 
its  associated  shaping  network  Nh.  They  com- 
bine to  give  the  total  input  measured  at  Pt. 

This  diagram  emphasizes  the  fact  that  we 
think  of  the  noise  and  signal  as  originating 
from  different  physical  sources.  By  postulate, 
however,  we  are  not  able  to  separate  the 
sources  experimentally.  So  far  as  any  observed 
result  is  concerned,  consequently,  we  may  as 
well  deal  with  the  simplified  structure  shown 
in  Figure  19  which  contains  a  single  source  of 

f  LAT 





— * 




Figure  19.    Schematic  representation  of  Wiener's 
prediction  theory  when  there  is  noise. 

flat  noise  and  a  single  shaping  network.  The 
transfer  admittance  of  the  shaping  network  N, 
is  determined  by  adding  the  power  spectra  of 
signal  and  noise,  converting  the  result  to  an 
amplitude  characteristic,  and  computing  the 
corresponding  minimum  phase  according  to 
^methods  already  used  for  the  noise-free 

Although  we  cannot  separate  the  signal  from 

d  Note  that  the  Bhaping  network  thu*  obtained  ia  not 
the  same  as  the  one  we  would  secure  by  adding  the 
transfer  admittances  of  N.  and  N,  in  Figure  18  di- 
rectly. In  order  to  realize  the  same  total  power  at  P, 
in  each  case,  it  is  necessary  to  begin  by  adding  the 
powers  rather  than  the  amplitude  characteristics  asso- 
ciated with  the  two  paths. 




the  noise  completely,  we  saw  earlier  that  the 
mean  square  difference  between  the  total  input 
and  the  signal  is  minimized  if  we  multiply  the 
amplitude  of  the  input  at  each  frequency  by 
the  ratio  of  the  signal  power  to  the  sum  of  the 
signal  and  noise  powers.  A  fictitious  filter 
having  the  prescribed  amplitude  characteristic 
is  represented  by  Nt  in  Figure  19.  We  assigned 
2V4  a  zero  phase  characteristic  so  that  there 
may  be  no  lag  in  producing  the  result  at  P,. 
Thus  the  output  at  Ps  at  any  instant  represents 
the  best  conceivable  estimate  (in  the  least 
squares  sense)  of  the  signal  at  that  instant. 
The  assumption  of  zero  phase,  of  course,  makes 
Ni  nonphysical,  since  it  must  have  at  least  the 
minimum  phase  characteristic  associated  with 
its  prescribed  amplitude  characteristic.  This, 
however,  is  not  an  objection  here  since  the 
structure  is  introduced  purely  for  purposes  of 

The  situation  is  now  reduced  to  a  form  in 
which  it  is  substantially  equivalent  to  the  one 
appearing  in  the  zero-noise  case.  Wi  assume  a 
series  of  random  impulses  at  P.,  which  would 
produce  responses  at  P,.  The  problem  is  that 
of  advancing  the  response  to  each  impulse  so 
that  the  same  result  appears  u  seconds  earlier 
at  terminal  P4.  The  solution  is  represented  by 
networks  2V,  and  N3,  which  discharge  functions 
similar  to  those  of  the  correspondingly  labeled 
networks  in  Figure  11.  Thus,  the  network  N2 
is  the  reciprocal  of  N,  and  is  provided  to  make 
terminal  P'2  equivalent  to  P„  as  a  source  of  im- 
pulses. Network  N3  is  defined  by  an  impulsive 
admittance  obtained  from  the  impulsive  admit- 
tance between  P,  and  P,  by  advancing  the 
latter  characteristic  a  units  in  time  and  then 
discarding  the  portion  at  negative  time. 

In  this  procedure  there  is  only  one  point  at 
which  the  situation  differs  from  that  without 
noise.  In  the  noise-free  case,  the  original  im- 
pulsive admittance  which  we  wished  to  advance 
in  time  was  identically  zero  at  negative  times. 
In  order  to  secure  a  physically  realizable  re- 
sult, we  needed  only  to  discard  the  portion  of  the 
impulsive  admittance  between  t  =  0  and  (  =  a. 
In  the  present  situation,  on  the  other  hand,  the 
impulsive  admittance  is  taken  from  a  path  in- 
cluding the  nonphysical  network  Nt.  Thus  the 
admittance  may  be  expected  to  take  such  form 
as  that  shown  in  Figure  20,  with  nonzero  am- 

plitudes at  both  negative  and  positive  times, 
and  in  order  to  secure  a  physical  final  network 
it  is  necessary  to  discard  everything  to  the  left 
of  the  line  a. 

Figure  20.    Typical  impulsive  admittance  of  best 
smoothing  network  Ni  in  Figure  19. 

This  difference  in  the  impulsive  admittance 
characteristics  has  two  consequences.  The  first 
is  the  fact  that  since  the  uncertainty  of  the 
prediction  is  measured  by  the  amount  of  im- 
pulsive admittance  which  must  be  discarded, 
it  is  evidently  greater  in  the  present  case  where 
we  are  discarding  much  more.  The  second  is 
the  fact  that  in  the  noise-free  case  uncertainty 
exists  only  for  a  positive  prediction  time.  A 
negative  prediction  time,  which  corresponds,  of 
course,  to  the  determination  of  the  value  as- 
sumed by  the  signal  at  some  time  in  the  past, 
can  be  set  into  the  analysis  as  easily  as  a  posi- 
tive prediction  time,  merely  by  shifting  the  im- 
pulsive admittance  to  the  right  rather  than  the 
left.  In  the  noise-free  case,  however,  there  is 
nothing  to  be  discarded  when  we  shift  to  the 
right,  since  the  impulsive  admittance  with 
which  we  begin  is  in  any  case  identically  zero 
for  negative  times.  Thus  the  uncertainty  in 
the  determination  of  any  past  value  of  the  sig- 
nal is  zero.  Since  we  have  postulated  no  noise 
to  confuse  the  data,  this  is,  of  course,  an 
inevitable  result.  As  soon  as  noise  is  included, 
on  the  other  hand,  there  is  no  such  sharp  dis- 
tinction between  the  future  and  the  past.e  The 
uncertainty  in  the  determination  of  the  true 
value  of  the  signal  in  the  near  past  is  almost 
as  great  as  it  is  in  estimating  what  the  signal 
will  be  in  the  near  future.  As  we  go  further 

*  This  statement  is  to  be  understood  in  a  physical 
rather  than  a  mathematical  sense.  It  is  not  intended 
to  imply  that  there  may  not  be  sharp  changes  of  be- 
havior in  the  impulsive  admittance  at  zero. 




and  further  into  the  past  the  uncertainty 
gradually  diminishes.  If  we  can  allow  ourselves 
unlimited  lag,  we  at  length  reach  a  point  at 
which  the  discarded  portion  of  the  impulsive 
admittance  characteristic  is  negligibly  small. 
This,  however,  does  not  mean  that  all  uncer- 
tainties have  disappeared,  but  merely  that  we 
can  base  our  estimate  of  the  signal  upon  the 
power-ratio  rule  developed  previously. 


It  has  been  fairly  easy  to  develop  a  qualita 
tive  picture  of  the  general  characteristics  of 
typical  data-smoothing  networks.  As  we  have 
seen,  they  have  amplitude  characteristics  of  the 
low-pass  filter  type  combined  with  lagging 
phase  shifts.  No  corresponding  qualitative  pic- 
ture of  the  characteristics  of  a  typical  overall 
predicting  circuit  has,  however,  been  developed 
as  yet.  The  discussion  just  concluded  provides 
a  rule  for  determining  the  characteristics  of  a 
predicting  circuit  in  any  given  case,  but  pro- 
vides comparatively  little  in  the  nature  of  a 
description  of  the  result  we  may  expect  to 

In  any  particular  situation  we  can,  of  course, 
calculate  the  overall  characteristics  of  the  pre- 
dicting circuit.  A  simpler  way  of  character- 
izing the  overall  predictor  characteristic  quali- 
tatively, however,  is  based  upon  the  use  of  the 
attenuation-phase  relations  for  physical  net- 
works. We  need  merely  use  such  an  equation 
as  (3)  backward.  Thus,  we  have  previously 
shown  that  a  positive  phase  slope  corresponds 
to  a  lagging  output.  Correspondingly,  a  nega- 
tive phase  slope  can  be  interpreted  to  repre- 
sent a  lead,  or  in  other  words,  a  prediction.' 

If  we  assign  (dB/di>)u  =  0  in  equation  (3)  a 
negative  value,  we  see  that  A-A0  must  on  the 
average  be  negative.  In  other  words,  the  am- 
plitude characteristic  of  an  overall  prediction 
circuit  must  rise,  on  the  average,  as  we  proceed 
upward  from  zero  frequency.  This  is  in  marked 
contrast  to  a  data-smoothing  network,  which, 
as  we  have  seen,  tends  to  have  a  low-pass  filter 
type  of  characteristic  with  a  falling  amplitude 
characteristic  at  high  frequencies.  The  in- 
creased amplitude  of  response  may  have  two 
detrimental  effects.  In  the  first  place,  it  evi- 
dently produces  a-  distorting  effect  on  any  sig- 
nal components  to  which  it  applies.  In  the 
second  place,  it  produces  an  exaggerated  re- 
sponse to  noise. 

Examples  of  the  characteristics  of  overall 
prediction  circuits  are  readily  constructed  by 
reference  to  the  circuit  of  Figure  21.  Various 

Figure  21.  One-dimensional  prediction  circuit 
with  data-smoothing  networks. 

'  This,  of  course,  does  not  mean  that  a  network  with 
a  negative  phase  slope  can  predict  a  perfectly  arbitrary 
event.  We  can  hope  to  realize  a  negative  phase  slope, 
in  combination  with  a  flat  amplitude  characteristic, 
over  only  a  finite  band.  The  spectrum  of  an  arbitrary 
event,  that  is,  any  suddenly  applied  signal,  will  always 
include  important  components  running  out  to  infinite 
frequency,  where  the  negative  phase  slope  can  no  longer 
be  realized.  The  statement  does,  however,  mean  that  if 
we  suddenly  apply  a  signal  made  up  of  one  or  more 
low-frequency  sinusoids,  and  wait  for  the  steady  state 
to  become  established,  the  output  will  appear  to  lead 
the  input  by  a  time  equal  to  the  slope  of  the  negative 
phase  characteristic. 

particular  results  are  obtained  by  assigning 
particular  characteristics  to  the  data-smooth- 
ing network.  Thus,  if  the  data-smoothing  net- 
work is  absent  entirely  the  transmission 
through  the  path  containing  the  differentiator 
is  u,tlt  since  differentiation  is  equivalent  to 
multiplication  by  i*>.  The  attenuation  of  the 
overall  circuit  is  consequently  A  =  —  log 
|1  +  imtf\.  This  is  plotted  as  curve  I  of  Figure 
22.  The  increasing  amplitude  characteristic  at 
high  frequencies  is  obviously  due  fundamen- 
tally to  the  increased  transmission  through  the 
differentiator  circuit. 

If  the  data-smoothing  network  is  assigned 
the  characteristic  (1  +  to**)-1,  corresponding  to 
a  very  simple  low-pass  filter  type  of  response, 
the  overall  transmission  becomes  that  shown 
by  curve  II  in  Figure  22.  (It  is  assumed  that 
a  =  t,,  for  simplicity.)  The  negative  attenuation 
at  high  frequencies  is  much  reduced.  This  is 
paid  for  by  an  increased  amplitude  of  response 
at  low  frequencies,  but  since  the  integration  in 
(3)  takes  place  on  an  inverse  frequency  scale, 
the  low-frequency  fragment  is  much  less  than 
the  gain  reduction  at  high  frequencies.  Curve 




Ill  shows  the  result  whan  the  data-smoothing 
network  is  assigned  the  characteristic 
(1  +  um)  *.  Finally,  curve  IV  shows  the  result 
obtainable  when  there  is  also  a  After  in  the 








Figure  22.  Attenuation  characteristics  of  predic- 
tion circuit  shown  in  Figure  21. 

present-position  circuit  (as  shown  by  the 
broken  lines  in  Figure  21),  so  that  there  may 
be  a  net  positive  attenuation  at  high  fre- 

In  view  of  the  inverse  frequency  scale  in  (3), 
the  gross  negative  attenuation  will  be  mini- 
mized if  the  negative  attenuation  region  is 
placed  very  close  to  zero  frequency.  This,  how- 
ever, means  that  much  of  the  signal  energy 
falls  in  the  negative  attenuation  region  so  that 
in  certain  respects,  at  least,  the  signal  response 
must  be  seriously  injured.  For  example,  in  the 
specific  circuits  just  discussed  we  can  place  the 
negative  attenuation  region  at  very  low  fre- 
quencies by  choosing  very  long  time  constants, 
a,  in  the  data-smoothing  networks,  with  the 
consequence  that  the  circuits  will  operate  cor- 
rectly for  any  long  continued  straight  line  path, 
but  will  be  very  sluggish  in  changing  from  one 
straight  line  to  another.  If  the  negative  attenu- 
ation region  is  placed  at  higher  frequencies,  on 
the  other  hand,  the  signal  response  is  improved 
but  beyond  certain  limits  the  circuit  becomes 
unbearably  sensitive  to  noise. 

Quantitative  illustrations  of  these  relation- 
ships are  quickly  constructed.  Suppose,  for  ex- 
ample, that  the  prediction  time  is  2  seconds. 
From  (3)  this  is  consistent  with  an  attenua- 

tion characteristic  having  zero  attenuation 
below  -  =  1  and  a  net  gain  of  *■  nepers  there- 
after. In  other  words,  the  amplitudes  of  all 
frequencies  below  «  =  1  are  increased  by  a  fac- 
tor of  about  22  to  1.  If  the  region  of  added 
gain  is  pushed  to  a  higher  frequency  or  con- 
centrated within  a  narrow  band,  the  multi- 
plying factor  rapidly  becomes  larger.  For  ex- 
ample, if  we  maintain  A  at  approximately  zero 
below  m  =  2,  the  average  gain  above  this  point 
must  be  2»  nepers,  corresponding  to  a  multi- 
plying factor  of  600  to  1.  We  secure  the  same 
factor  by  attempting  to  concentrate  the  region 
of  negative  attenuation  in  the  band  between 
m  =  1  and  m  =  2.  The  multiplying  factor  also 
goes  up  rapidly  as  we  increase  the  prediction 
time.  For  example,  with  the  gain  uniformly 
spread  over  the  frequency  region  above  «>  =  1 
the  multiplying  factor  is  500  for  a  prediction 
time  of  4  seconds,  or  more  than  10,000  for  a 
prediction  time  of  6  seconds. 

Reasonable  multiplying  factors  with  long 
prediction  times  can  be  obtained  only  by  carry- 
ing the  negative  attenuation  region  to  very  low 
frequencies.  As  indicated  previously,  the  cost 
of  this  is  an  increase  in  the  time  required  for 
the  signal  to  change  from  one  constant  or 
nearly  constant  value  to  another.  For  exam- 
ple, in  the  first  illustration  above,  if  the  region 
of  nepers  net  gain  is  carried  down  from 
o>  =  1  to  n  =  0.2  the  integral  in  (3)  is  just  five 
times  as  great  as  it  was  before,  so  that  the 
characteristic  corresponds  to  a  prediction  time 
of  10  rather  than  2  seconds.  This  change 
would  correspond  to  an  increase*  from  perhaps 
4  or  5  to  perhaps  20  or  25  seconds  in  the  time 
required  for  the  circuit  to  settle  from  one  con- 
stant value  to  another. 

Practical  examples  of  the  transmission  char- 
acteristics of  overall  prediction  circuits,  with 
particular  emphasis  on  the  dominant  effect  of 
even  very  small  negative  attenuations  at  ex- 
tremely low  frequencies,  are  shown  later  in 
Figures  5  to  8,  inclusive.  In  the  linear  predic- 
tor, A  -  A„  varies  as  —  ku>2  nears  zero,  and  it  is 
easily  seen  that  such  a  term  makes  a  finite  con- 

«  Only  rough  numbers  can  be  given,  since  circuits 
with  the  square-cornered  attenuation  characteristics 
chosen  for  illustrative  purposes  would  have  very  ripply 
transient  characteristics,  corresponding  to  no  very  well 
marked  settling  time. 




tribution  to  the  integral  in  (3) .  On  the  other 
hand,  the  attenuation  of  the  quadratic  predic- 
tor, which  is  capable  of  dealing  exactly  with 
polynomial  functions  of  time  of  the  second 

degree  or  less,  is  necessarily  zero  at  the  origin" 


v2*£f  JS£  of  Quasi-Distortionleas  Prediction 

Networks  in  Appendix  A. 

to  terms  of  the  order  of  «4,  so  that  the  integral 
in  this  region  can  be  neglected.  This  slight 
difference  between  the  two  characteristics  at 
frequencies  of  the  order  of  0.01  cycle  per 
second  and  below  is  sufficient  to  balance  the 
obviously  greater  negative  attenuation  of  the 
quadratic  predictor  at  higher  frequencies. 


Chapter  9 


THE  discussion  in  the  previous  two  chap- 
ters has  been  based  upon  the  assumption 
that  the  least  squares  criterion  forms  a  suita- 
ble measure  of  performance  for  a  predicting 
network.  This  assumption  permitted  us  to  re- 
strict our  attention  to  the  amplitude  spectra 
of  the  signal  and  .noise,  leaving  phase  relations 
entirely  out  of  account.  Thus,  both  signal  and 
noise  could  be  thought  of  as  "random  noise" 
functions  characterized  by  random  phases  and 
Gaussian  distributions,  as  described  in  the 
preceding  chapter.  So  far  as  the  noise  is  con- 
cerned, there  seems  to  be  nothing  wrong  with 
this  assumption.  In  the  case  of  the  signal,  how- 
ever, it  appears  that  significant  phase  relations 
may  exist.  This  chapter  will  consequently  set 
up  an  alternative  analysis  which  permits  the 
significance  of  possible  phase  relations  in  the 
target  paths  to  be  estimated. 

The  alternative  analysis  is  based  upon  the 
assumption  that  the  target  courses  are  sequen- 
ces of  analytic  segments  of  different  lengths 
joined  together.  These  segments  are  simple 
predictable  curves  such  as  straight  lines,  pa- 
rabolas, and  circles.  Significant  phase  relations 
are  implied  by  the  assumption  that  there  are 
sudden  changes  from  one  type  of  course  to 

This  picture  of  target  paths  is,  of  course, 
extreme.  There  are  no  such  sharp  discontinui- 
ties between  one  segment  and  another,  nor  do 
airplanes  fly  perfectly  along  simple  curves 
even  for  limited  periods.  Nevertheless,  it  is 
the  conception  of  target  courses  upon  which 
the  rest  of  our  analysis  is  based.  The  reasons 
for  believing  that  it  is  a  closer  approximation 
to  actual  target  courses  than,  say,  a  random 
noise  function  with  the  same  power  spectrum 
would  be,  are  given  later.  Perhaps  more  im- 
portant is  the  fact  that  the  possibility  of  hit- 
ting an  airplane  flying  along  such  a  simple 
analytic  arc  is  much  greater  than  it  would  be 
if  we  were  attempting  to  predict  a  correspond- 
ing random  noise  function.  It  is  thus  advan- 
tageous to  take  the  analytic  arc  assumption  as 
a  basis  for  designing  the  prediction  circuit, 

even  if  the  assumption  seems  to  be  reasonably 
well  justified  over  only  occasional  segments  of 
actual  target  paths.  An  example  of  such  a 
situation  is  furnished  by  the  bombing  run 
illustration  described  in  Chapter  7. 

As  a  corallary  to  the  analytic  arc  assump- 
tion it  is  also  assumed  that  the  theoretical 
predicted  point  must  be  quite  close  to  the  actual 
target  position  if  the  probability  of  scoring  a 
hit  is  to  be  appreciable.  In  other  words,  such 
dispersive  factors  as  random  errors  in  com- 
puter or  gun  or  the  lethal  radius  of  the  shell, 
which  would  tend  to  produce  occasional  hits  at 
long  distances  from  the  theoretical  predicted 
point,  are  quite  small.  This  is  such  a  plausible 
assumption  in  the  light  of  present-day  antiair- 
craft experience  that  its  critical  importance  in 
the  present  argument  is  likely  to  go  unper- 
ceived.  However,  this  is  the  assumption  which 
limits  consideration  to  small  errors  in  predic- 
tion, whereas  the  least  squares  criterion  natu- 
rally gives  greatest  emphasis  to  large  errors. 
If,  for  example,  antiaircraft  projectiles  were 
suddenly  endowed  with  a  much  greater  de- 
structive radius,  we  would  be  much  more  in- 
terested in  fairly  large  misses,  and  the  objec- 
tions to  the  least  squares  criterion  would  disap- 

These  postulates  are  discussed  in  more  detail 
in  the  following  sections.  In  anticipation  of 
this  discussion  the  following  conclusions  may 
be  mentioned: 

1.  With  the  assumptions  as  stated,  the  pre- 
diction should  be  on  a  modal  rather  than  a 
least  squares  basis.  In  other  words,  the  gun 
should  be  aimed  at  the  most  probable  future 
position  of  the  target. 

2.  Modal  prediction  requires  evaluation  of 
the  parameters  of  the  analytic  arc  the  target 
is  at  present  traversing.  This  can  be  accom- 
plished by  smoothing  the  values  of  these  pa- 
rameters evaluated  for  a  period  in  the  past. 

3.  If  the  smoothing  is  performed  by  linear 
invariable  networks,  the  impulsive  admittances 
of  these  networks  should  have  a  definite  cutoff 
after  a  finite  smoothing  time.  By  this  means 




all  data  over  a  certain  age  are  given  zero  weight. 
The  method  of  calculating  the  proper  smooth- 
ing time  is  developed. 

4.  Definite  advantages  can  be  obtained  from 
circuits  with  variable  smoothing  times  if  such 
systems  can  be  satisfactorily  mechanized. 


The  target  courses,  like  the  tracking  errors, 
can  be  thought  of  as  a  statistically  generated 
set  of  functions — that  is,  a  stochastic  process. 
The  structure  of  this  process  is,  however,  very 
different  from  that  of  the  tracking  errors.  It 
is  by  no.  means  satisfactory  to  assume  the 
target  courses  to  be  equivalent  to  a  random 
noise  having  the  same  power  spectrum  as  the 
target  courses.  As  we  pointed  out  in  Chapter 
7,  the  target  is  piloted  by  a  purposeful  human 
being.  It  tends  to  follow  a  definite  simple  curve 
for  a  period  of  time  and  then  to  shift  to  a  new 
simple  curve.  Much  of  the  flight  is  in  attempted 
straight  lines  with  constant  velocity.  Most  of 
the  remainder  can  be  considered  to  be  segments 
of  circles  or  helices  in  space,  or  as  segments  of 
parabolas  or  higher  degree  curves.  Straight 
line  constant  speed  flight  corresponds  to  the 
airplane  controls  in  a  neutral  position.  The 
helical  flight  is  a  natural  generalization  allow- 
ing arbitrary,  but  fixed,  positions  of  the  con- 
trols. The  curves  which  are  parabolic  functions 
of  time  correspond  to  constant  acceleration  in 
the  three  space  coordinates.   Thus,  all  these 
assumptions  have  a  reasonable  physical  back- 

Most  antiaircraft  computers  are  constructed 
on  the  assumption  of  straight  line  flight,  al- 
though some  work  has  been  done  in  World 
War  II  on  curved  flight  directors  both  with  the 
helical  and  the  parabolic  assumptions.  There  is 
not  a  great  deal  of  difference  in  these  two 
generalizations  from  the  practical  point  of 
view,  since  determination  of  acceleration  terms 
is  subject  to  such  large  errors  in  any  case. 

The  important  part  of  this  representation 
of  the  target  courses  is  that  they  consist  of 
segments  of  simple  analytic  curves  joined  to- 
gether. The  individual  segments  are  completely 
predictable  if  we  have  a  part  of  the  segment 
given  exactly.  One  need  merely  evaluate  the 
parameters  of  the  segment  from  the  given  part 

and  evaluate  the  curve  for  t  -  tf.  The  unpre- 
dictable part  of  the  target  courses  is  due  to  the 
possibility  of  sudden  changes  from  one  segment 
to  another.  With  random  noise  functions  the 
unpredictableness  occurs  continuously. 

This  simplified  description  of  the  target 
courses  as  piecewise  analytic  functions  must 
be  recognized  as  only  a  first  approximation.  A 
more  complete  description  of  the  target  course 
would  include  the  "fine  structure,"  the  con- 
necting curves  between  the  various  analytic 
segments  and  the  deviations  from  the  segments 
due  to  random  air  disturbances  and  similar 
causes.  This  latter  effect,  the  wandering  of  the 
target  from  its  intended  path,  might  be  reason- 
ably well  represented  by  the  addition  of  a 
random  noise  function  to  the  piecewise  analytic 
functions  described  above. 


The  analytic  segments  of  which  the  course 
is  supposed  to  consist  are  not  all  of  the  same 
duration  —  we  may  assume  some  probability 
distribution  of  the  duration  of  these  segments. 
The  simplest  assumption  here  is  that  the 
breaks  occur  in  a  Poisson  distribution  in  time. 
This  assumption  is  not  necessary  for  our 
analysis  but  is  a  reasonable  one  and  leads  to 
a  simple  mathematical  treatment.  Any  other 
reasonable  distribution  would  give  comparable 

A  series  of  events  is  said  to  occur  in  a 
Poisson  distribution  in  time  if  the  periods  be- 
tween successive  events  are  independent  in  the 
probability  sense  and  are  controlled  by  a  distri- 
bution function 

p(l)dl  =  -  e-"«  dl . 

Here  p(l)dl  is  the  probability  of  an  interval  of 
length  between  I  and  I  +  dl.  This  means  that 
the  frequency  of  intervals  of  a  given  length  is 
a  decreasing  exponential  function  of  the  length. 
This  type  of  distribution  is  familiar  in  physics 
as  describing  the  decay  of  radioactive  sub- 
stances. The  time  a  in  the  distribution  function 
is  the  average  length  of  the  intervals,  since 





-  e-'/a  dl 
'o  ° 

=  a  . 

It  is  related  to  the  "half  life"  6  of  the  interval 

b  =  a  In  2  . 

The  single  number  a  completely  specifies  the 
Poisson  distribution.  The  events  may  be  said 
to  be  happening  as  randomly  as  possible  apart 
from  the  fact  that  they  occur  at  an  average 
rate  of  1/a  per  second. 

Another  way  of  describing  a  Poisson  distri- 
bution of  events  is  the  following.  The  probabil- 
ity of  an  event  in  a  small  interval  of  duration 
dl  is  (l/a)dl  and  is  independent  of  whether  or 
not  events  have  occurred  in  any  other  nonover- 
lapping  intervals. 



Let  us  suppose  that  we  have  a  record  of  the 
course  of  the  target  up  to  the  present  time  and 
a  complete  statistical  description  of  the  set  of 
target  courses.  What  can  then  be  said  about  the 
position  of  the  target  tt  seconds  from  now?  If 
we  were  able  to  analyze  the  data  completely 
the  most  we  could  obtain  would  be  a  probability 
distribution  function  for  the  future  position. 
This  distribution  function  would  give  the  prob- 
ability, in  the  light  of  the  course  history,  of 
the  target  being  at  any  point  in  space  at  the 
future  time.  This  function  would  assume  large 
values  at  likely  points  and  low  values  at  un- 
likely points.  For  t,  small  the  distribution 
would  be  highly  concentrated  and  for  larger  lt 
it  would  tend  to  spread  out. 

In  the  simple  case  we  have  been  discussing, 
of  a  Poisson  distribution  of  sudden  changes  in 
type  of  course,  the  distribution  consists  of  two 
parts.  First,  there  is  a  spike  of  probability  at 
one  point,  the  continuation  of  the  present  pre- 
dictable segment.  Second,  there  is  a  continuous 
distribution  which  corresponds  to  possible 
changes  to  a  new  segment  during  the  time  of 
flight.  As  t,  increases  the  total  probability  in 
the  spike  decreases  exponentially  toward  zero, 
and  the  total  in  the  continuous  part  increases 
exponentially  toward  unity.  The  behavior  is 
roughly  as  indicated  in  Figure  1. 




3-2-1  ( 

)         1         2  3 

Figure  1. 
sition  of 

Probability  distribution  of  future  po- 
target,   assuming   piecewise  analytic 

A  very  different  type  of  future  position  dis- 
tribution is  exhibited  with  other  assumptions 
about  the  target  courses.  For  example,  suppose 
the  courses  were  random  noise  functions  with 
the  power  spectrum 

P^  =  ^Ar-,  • 

fl2  +  0)2 

A  typical  noise  function  with  this  spectrum  is 
shown  in  Figure  2.  In  Figure  3  is  shown  a 
typical  velocity  under  the  other  assumption, 
that  the  courses  are  piecewise  analytic  and  in 
fact  straight  lines  between  breaks.  If  the 
breaks  are  Poisson  distributed,  both  Figure  2 
and  Figure  3  have  the  same  power  spectrum, 
l/(a2  +  a.2).  The  future  distribution  of  veloci- 
ties for  Figure  3  is  shown  in  Figure  1,  and  for 
Figure  2,  it  will  be  as  shown  in  Figure  4.  In  the 
random  noise  case  the  future  distribution  is  a 




Gaussian  distribution  with  no  spike.  The  center 
of  this  distribution  decreases  exponentially  to- 
ward zero  with  increasing  time  of  flight  ac- 
cording to  the  formula 

Xtj  =  A'o  e  "f 

where  X0  is  the  present  value  of  the  function 
and  X.,  is  the  mean  of  the  future  distribution. 

*t  t 


— ,  1 

Figure  2.    Typical  noise  function. 

The  standard  deviation  <r  of  the  distribution  in- 
creases exponentially  toward  the  rms  value  of 
the  function  according  to 

u  =  A(l  -  e-*"/). 

Supposing  that  this  distribution  function 
could  be  determined,  where  should  the  gun  be 
aimed?  The  answer  to  this  will  depend  on  two 
factors:  the  gun  dispersion,  and  the  lethal 




Figure  3.    Typical  velocity  function. 

effects  of  the  shell.  If  the  gun  is  aimed  to 
explode  the  shell  at  a  certain  point  in  space, 
the  shell  will  not  necessarily  explode  at  that 
point,  but  rather  there  will  be  a  distribution  of 
positions  centered  about  the  point  aimed  at, 
because  of  gun  dispersion.  Also,  if  the  shell 
explodes  at  a  certain  point  and  the  target  is  at 

another  point,  there  will  be  a  certain  proba- 
bility of  lethal  effect  which  decreases  rapidly 
with  increasing  distance  between  the  points. 
These  two  functions  could  be  combined  by  a 
product  integration  to  give  the  probability  of 
t  if  the  target  is  at  one  point  and 



■2-1  0  I  2  3. 

Figure  4.  Probability  distribution  of  future  posi- 
tion of  target,  assuming  courses  with  random 
noise  properties. 

the  gun  aimed  to  explode  the  shell  at  a  second 
point.  To  determine  the  probability  of  a  hit 
when  aiming  at  a  certain  point,  then,  we  should 
multiply  the  probability  of  the  target  being  at 
each  point  in  space  by  the  probability  of  lethal 
effect  when  it  is  at  that  point  and  integrate  the 
product  over  all  space.  The  optimum  point  of 
aim  will  be  the  one  which  maximizes  this  in- 
tegrated product. 

In  one  dimension  this  may  be  expressed 
mathematically  as  follows.  Let  P(x)  be  the 




future  position  distribution  of  the  target,  so 
that  P(x)dx  is  the  probability  of  it  being  in 
the  interval  from  x  to  x  +  dx  at  the  future  time. 
Let  Q(x,y)  be  the  probability  of  hitting  the 
target  if  the  gun  is  aimed  at  point  y  and  the 
target  is  at  point  x.  Then  the  total  probability 
of  a  hit  when  aiming  at  point  y  is 



P{x)  Q(x,y\  dx  . 

The  point  of  aim  y  should  be  chosen  to  maxi- 
mize R(y). 

In  the  cases  we  consider,  the  lethal  radius  of 
the  shell  and  the  dispersion  of  the  gun  are  both 
assumed  to  be  small  in  comparison  with  the 
range  of  future  positions  if  there  is  a  change 
of  course  during  the  time  of  flight.  This  means 
that  Q(x,y)  is  small  unless  x  is  xery  near  to  y. 
Q(x,y)  can  be,  in  fact,  considered  to  be  a  8 
function  of  (x-y),  and  the  value  R(y)  is  then 
just  a  constant  times  P(y).  Thus,  the  best 
aiming  point  under  this  assumption  is  the  most 
probable  future  position  of  the  target.  The  as- 
sumption of  small  lethal  distance  is  generally 
valid  with  antiaircraft  fire  and  ordinary  chemi- 
cal explosive  shells. 

Now  the  most  probable  future  position  in  our 
case  is  the  spike  of  probability  corresponding 
to  the  analytic  extrapolation  of  the  present  seg- 
ment of  the  target  course.  To  determine  its 
position  one  must  find  the  parameters  of  this 
segment  and  evaluate  for  t,  seconds  in  the 
future.  For  example,  if  the  segments  are  as- 
sumed to  be  straight  lines  (constant  velocity 
target)  the  velocity  components  are  determined 
and  multiplied  by  t,  to  give  the  predicted 
change  in  position.  These  changes  are  added  to 
the  present  position  to  give  the  future  position. 
If  helical  or  parabolic  segments  are  assumed, 
the  parameters  of  these  curves  are  determined 
from  the  past  data,  and  the  curves  extrapo- 
lated t,  seconds  into  the  future. 

These  conclusions  may  be  contrasted  with 
the  idea  of  aiming  at  the  point  which  mini- 
mizes the  mean  square  error.  The  least  squares 
criterion  amounts  to  aiming  at  the  mean  or 
center  of  gravity  of  the  future  distribution  of 
position.  This  point  will  ordinarily  be  under 
the  continuous  part  of  the  distribution  and  not 
at  the  spike;  e.g.,  the  point  marked  in  Figure  1. 
Its  position  depends  to  a  considerable  extent  on 

distant  parts  of  the  distribution,  which  would 
surely  bo  complete  misses  in  any  case.  The 
chief  advanta.:;  .  the  least  squares  criterion 
is  that  it  fits  in  well  with  the  mathematical 
tools  suitable  to  these  problems,  leading  to 
solvable  equations. 

The  least  squarns  <  nterion  will  still  appear 
in  our  analysis  in  rKat  we  attempt  to  smooth 
our  course  param>:t. ra  in  such  a  way  as  to 
minimize  the  mean  square  error  in  these,  a 
very  different  thinp  fr  m  minimizing  the  mean 
square  error  in  th*    redicted  position  of  the 

••*     \ECES<]  I  V  OK  A  SHARP  CUTOFF 

The  changes  in  the  course  parameters  be- 
tween-adjacent segments  can  be  very  large. 
Also,  at  the  start  of  operations  and  in  changing 
from  one  target  to  another  there  will  be  large 
and  erratic  variation  of  the  input  to  the 
smoothing  and  predicting  circuits,  unrelated  to 
the  present  target  course.  If  any  of  these  data 
are  used  in  prediction,  the  result  will  almost 
surely  be  a  miss  because  of  the  small  lethal 
radius  of  the  shell.  The  only  way  to  eliminate 
these  errors  in  a  linear  invariable  system  is  to 
have  all  weighting  functions  cut  off  sharply 
after  a  short  time.  Then  ail  data  over  a  certain 
age  are  eliminated.  Hits  will  occur  only  when 
the  target  has  been  on  a  predictable  segment  for 
this  length  of  time  or  more  and  remains  there 
at  least  t,  seconds  in  the  future. 

Suppose  the  weighting  function  for  velocity 
has  a  1  per  cent  tail  beyond  the  cutoff  point 
and  that  the  trackers  start  following  the  target 
from  a  zero  position.  Then  after  the  smoothing 
time  there  will  be,  because  of  the  lack  of  exact 
cutoff,  a  1  per  cent  error  in  velocity.  If  the 
time  of  flight  were  15  seconds  and  the  target 
velocity  200  yards  per  second,  this  represents 
an  error  of  W  yards  in  predicted  position. 
Since  this  is  comparable  to  the  other  errors  in 
a  typical  director,  we  conclude  that  the  tail  of 
the  smoothing  curve  should  not  be  much  greater 
than  1  per  cent  of  its  total  area. 

Under  the  assumptions  we  have  made,  the 
proper  smoothing  time  to  maximize  the  number 
of  hits  can  be  determined  as  follows.  Let  P(l) 





be  the  probability  that  a  predictable  segment 
of  the  course  lasts  for  I  seconds  or  more.  In 
the  Poisson  case  this  function  is 

P(l)  =  e-'/a 

With  a  given  smoothing  time  S  there  will  be  a 
certain  probability  of  hitting  the  target,  as- 
suming it  has  been  on  the  present  segment  for 
S  seconds  in  the  past  and  will  remain  there  for 
tf  seconds  in  the  future.  We  assume  changes 
in  course  to  be  so  large  that  any  change  re- 
sults in  a  miss.  This  probability  of  a  hit  Q(S), 
provided  it  remains  on  the  course,  will  be  an 
increasing  function  of  S.  Ordinarily  the  stand- 
ard deviation  will  decrease  as  the  square  root 
of  the  smoothing  time.  We  have  assumed  the 
lethal  radius  of  the  shell  small  compared  to  the 
dispersion  of  shells  about  the  target.  The  prob- 
ability of  a  hit  will  then  vary  inversely  with 
the  volume  through  which  the  shells  are  dis- 
persed. If  the  gun  itself  had  no  dispersion  but 
all  errors  were  due  to  tracking  errors  (and  if 
the  tracking  error  spectrum  is  flat),  the  prob- 
ability of  a  hit  would  then  vary  as  KS*f*  for 
S  in  the  region  of  interest.  This  is  because 
there  are  three  dimensions  and  the  expected 
error  in  each  of  these  is  decreasing  as  S~1/2. 
With  gun  dispersion  present,  Q(S)  will  have 
the  form 



where  a,  is  the  standard  deviation  due  to  the 
gun  dispersion,  and  a2y/a/S  that  due  to  track- 
ing errors.  The  sum  of  the  squares  is  the  total 
variance  in  each  dimension  and  the  three- 
halves  power  gives  the  total  dispersion  volume. 

When  these  two  functions  P(l)  and  Q(S) 
are  known,  the  best  smoothing  time  is  that 
which  minimizes  the  product 

P(S  +  tf)  ■  Q(S)  . 

The  first  term  is  the  probability  of  a  predict- 
able segment  of  the  course  lasting  S  -+-  tf  sec- 
onds, and  the  second  term  is  the  probability  of 
a  hit  if  it  does  last  that  long.  Therefore,  the 
product  is  the  probability  of  a  hit  with  smooth- 
ing time  S. 

In  the  Poisson  case,  with  no  gun  dispersion, 
the  calculation  is  as  follows : 

P(l)  =  e 

s  + 1, 

P(S  +  tf)  =  e~~  =  Ae 

Q(S)  =  .S« 
f(S)  =  P(S  +  t,)Q(S)  =  Be~*'° 


f'(S)  =b[< 

-S/a  3  ^1/2  _  l^-S/o^S/! 

S  =  la 

The  proper  smoothing  time  is  %  of  the  aver- 
age segment  length,  and  is  independent  of  the 
time  of  flight  and  all  other  factors. 

The  presence  of  gun  dispersion  and  computer 
errors  which  are  independent  of  smoothing 
time  decreases  the  best  S  from  this  value.  In 
this  case  the  equation  for  optimal  S  is  the 

,    2S     3  a 




—  = 


-4  +  a^/c\  +  6<r« 

Here  n,  is  the  part  of  the  errors  which  is  in- 
dependent of  smoothing  time  (dispersion 
errors  in  the  computer,  etc.)  and  at  is  the  error 
which  varies  inversely  with  the  square  root  of 
S,  a,  being  its  value  at  S  =  a.  Ordinarily  ^  is 
several  times  a.,  in  which  case  we  have  approxi- 

~*  ~a~  o\ 

ffi  Is 


There  are  other  factors  which  we  have  neg- 
lected, which  decrease  the  best  smoothing  time 
still  further.  The  wandering  of  the  target  about 
the  predictable  segments  assumed  in  the  above 
simplified  analysis  makes  old  data  less  reliable 
and  therefore  reduces  S.  Also,  there  is  the  tac- 
tical consideration  that  when  starting  to  track 
a  target  it  is  desirable  to  commence  firing  as 
soon  as  possible,  even  if  reducing  this  time 
makes  individual  hits  somewhat  less  probable. 
For  these  and  other  reasons  the  best  smooth- 
ing time  will  be  just  a  fraction  of  a. 





The  compromise  required  in  choosing  a  cer- 
tain definite  smoothing  time  can  be  eliminated 
by  the  use  of  nonlinear  elements.  In  particular, 
if  a  method  is  devised  for  determining  when 
changes  of  course  occur,  this  indication  can  be 
used  to  start  a  new  linear  but  variable  smooth- 
ing operation,  so  that  the  device  uses  all  the 
data  pertinent  to  the  present  segment  and  no 
data  from  previous  segments.  There  is  a  clear 
improvement  in  such  cases  although  not  so 
great  as  might  be  expected.  There  are  many 
practical  difficulties  in  proper  adjustment  of 
such  a  "trigger"  action.  If  the  trigger  is  too 
sensitive  it  will  assume  new  segments  due 
merely  to  tracking  noise  and  seldom  allow  suffi- 
cient smoothing  for  accurate  fire.  If  it  is  too 
insensitive  it  fails  in  its  function  of  quickly 

locating  changes  of  segment.  Since  the  noise 
and  target  courses  are  subject  to  considerable 
variation,  this  aujustment  is  not  easy. 

In  such  a  system  the  smoothing  may  be 
linear — the  only  nonlinearity  is  the  tripping 
circuit.  The  analysis  of  best  weighting  func- 
tions, etc.,  given  in  later  chapters  can  for  the 
most  part  be  applied  to  such  cases.  There  may 
also  be  advantages  to  be  derived  from  making 
the  smoothing  operator  depend  on  the  general 
position  in  space  of  the  target  relative  to  the 
gun.  The  smoothing  time  may  be  varied,  for 
example,  as  a  function  of  the  time  of  flight. 
This  type  of  variation  would  be  slow  compared 
to  the  noise  frequency,  and  here  again  the 
linear  analysis  can  be  used. 

Whether  any  real  advantage  can  be  obtained 
by  "strongly"  nonlinear  smoothing  in  practical 
cases  other  than  these  two  possibilities  is  ques- 


Chapter  10 


The  analytic  arc  assumption  described  in 
the  previous  chapter  immediately  allows  us 
to  reduce  a  vast  proportion  of  data-smoothing 
problems  to  a  relatively  conci'ete  form.  Obvi- 
ously the  arc  will  be  specified  by  a  number  of 
parameters  and  the  principal  object  of  the  com- 
puting and  data-smoothing  circuits  must  be  to 
isolate  values  of  these  parameters  on  the  basis 
of  which  a  prediction  can  be  made.  In  practi- 
cal cases  the  instantaneous  values  of  the 
parameters  are  isolated  by  coordinate  con- 
verters. The  function  of  the  data-smoothing 
circuit  is  to  provide  a  suitable  average  from 
these  instantaneous  values.  This  is  called 
"smoothing  a  constant''  here  since  the  param- 
eters are  assumed  to  be  constant  along  each 
arc,  although  they  may  change  radically  from 
one  arc  to  another. 

The  data-smoothing  network  is  most  con- 
veniently specified  by  its  impulsive  admittance. 
(See  Appendix  A.)  In  accordance  with  the 
assumptions  made  in  the  previous  chapter,  it 
will  be  assumed  that  the  desired  impulsive  ad- 
mittance is  identically  zero  after  some  limiting 
time  T.  Thus,  T  seconds  after  a  change  from 
one  analytic  arc  to  the  next  the  new  parameter 
value  is  established.  T  is  the  so-called  "settling 
time"  of  the  data-smoothing  network. 

With  the  settling  time  limit  given,  the  prob- 
lem of  choosing  a  suitable  data-smoothing  net- 
work reduces  to  that  of  finding  the  best  shape 
of  the  impulsive  admittance  characteristic  for 
t  <  T.  Obviously  this  shape  determines  how 
the  output  of  the  network  changes  in  going 
from  the  parameter  value  appropriate  for  the 
first  arc  to  that  appropriate  for  the  second.  The 
exact  way  in  which  the  response  settles  from 
one  constant  value  to  the  next  is,  however, 
usually  of  comparatively  little  interest.  The 
shape  of  the  weighting  function  is  of  impor- 
tance chiefly  because  of  its  effect  on  the  noise. 
For  each  noise  spectrum  there  is,  in  principle, 
an  optimum  shape  for  the  weighting  function. 
The  present  chapter  approaches  the  problem  of 
choosing  a  shape  which  will  minimize  the  effect 
of  noise  from  several  points  of  view. 

It  should  be  noted  that  the  term  noise  as  used 
here  does  not  necessarily  refer  to  the  errors 
associated  directly  with  the  tracking  data.  The 
tracking  data  may  have  been  subjected  to  co- 
ordinate conversions,  differentiations,  or  other 
processes  of  computation  before  reaching  the 
data-smoothing  network."  The  noise  associated 
with  the  signal  to  be  smoothed  thus  will  usually 
have  characteristics  differing  from  those  of  the 
noise  associated  with  the  tracking  data. 


Before  attacking  the  problem  of  smoothing  a 
constant  in  a  systematic  way  it  is  worth  while 
to  consider  an  important  special  case.  This  is 
the  so-called  exponential  smoothing  circuit.  It 
leads  to  a  data-smoothing  network  in  which 
the  output  V  is  related  to  the  input  E  by 


r)  dr 

so  that  the  impulsive  admittance  W(t)  is  an 
exponential  function  of  time,  as  illustrated  by 
Figure  1. 

-2         0  2         4  6 

Figure  1.    Simple  exponential  weighting  function. 

An  impulsive  admittance  of  the  type  shown 
in  Figure  1  does  not  show  any  very  definite 
settling  time.  The  exponential  curve  ap- 
proaches zero  gradually,  and  it  is  a  long  time 
after  a  change  in  course  before  the  effects  of 
the  data  obtained  on  the  old  course  are  negli- 
gible. This  is  obviously  an  undesirable  result, 

1  In  exceptional  circumstances  the  physical  apparatus 
in  which  these  processes  are  carried  out  may  also  be 
sources  of  additional  noise. 





and  the  exponential  weighting  function  is  con- 
sequently not  a  recommended  one  for  situations 
to  which  the  analytic  arc  assumption  applies. 
The  exponential  solution  is,  however,  described 
here  because  it  occurs  in  such  a  vast  variety  of 
cases.  It  is  found,  in  fact,  whenever  the  data- 
smoothing  device  is  specified  by  a  linear  first- 
order  differential  equation  with  constant  coeffi- 
cients. It  may  thus  correspond  to  many  simple 
situations.  For  example,  this  is  the  result 
which  would  be  obtained  in  an  electrical  circuit 
if  we  smoothed  the  data  by  placing  a  simple 
shunt  capacity  across  a  resistance  circuit.  In 
mechanical  structures  it  is  encountered  when- 
ever the  damping  depends  either  upon  simple 
inertia  or  a  simple  compliance. 

Simple  exponential  smoothing  also  occurs  in 
a  variety  of  other  situations  which  may  be 
somewhat  less  obvious.  For  example,  it  is  the 
effective  result  in  either  an  aided  laying  or  a 
regenerative  tracking  scheme  whenever  the 
ratio  between  rate  and  displacement  correc- 
tions is  fixed.  Another  somewhat  similar  ex- 
ample is  furnished  by  the  feedback  amplifier 
circuit  shown  in  Figure  2.  Since  rapid  fluctua- 

Figurx  2.  Feedback  amplifier  circuit  giving  simple 
exponential  weighting  function. 

tions  in  the  output  of  this  amplifier  are  fed 
back  through  the  capacity  and  tend  to  oppose 
the  input  voltage,  the  structure  acts  as  a 
smoother,  and  more  detailed  analysis  would 
show  that  it  has  characteristics  similar  to  those 
obtained  by  using  a  shunt  capacity  across  a 
resistance  circuit.  The  structure  is  introduced 
here  because  considerable  use  is  made  of  it  in 
connection  with  the  discussion  of  nonlinear 
smoothing  in  a  later  chapter. 

One  simple  conclusion  about  data-smoothing 
networks  can  be  drawn  immediately  from  this 
discussion.  Since  all  structures  simple  enough 
to  be  specified  by  a  first-order  differential  equa- 

tion give  exponential  smoothing,  which  has  no 
very  well-marked  settling  time,  it  is  clear  that 
a  data-smoothing  network  which  shows  a  well- 
defined  settling  time  must  probably  be  at  least 
moderately  complicated. 


Consider  the  signal  E  shown  in  Figure  3 
under  the  assumption  that  the  true  signal  is 
constant  and  the  superposed  noise  is  random 

t-T  t 
Figure  3.    Piecewise  constant  signal  with  noise. 

with  a  flat  spectrum.  The  best  constant  A,  in 
the  least  squares  sense,  which  can  be  fitted  to 
the  signal  from  t  -  T  to  Ms  that  which  mini- 


[A  -  E(X)]3  d\  , 



E(K)  . 


Comparing  this  with  equation  (2),  Appendix 
A,  it  will  be  seen  that  A,  which  is  obviously  a 
function  of  t,  is  the  response  to  the  assumed 
signal  of  a  network  whose  impulsive  admit- 
tance is 



0  <  t  <  T 


This  is  the  best  weighting  function  for  smooth- 
ing under  the  assumed  circumstances.  It  is 
illustrated  in  Figure  4. 

A  more  complex  situation  is  one  in  which  the 
true  signal  is  a  line  of  constant  slope  with 




Figure  4.  Best  weighting  function  for  smoothing 
piecewise  constant  signal. 




superposed  flat  random  noise,  as  shown  in  Fig- 
ure 5.  For  convenience  the  analysis  will  be 
conducted  in  terms  of  the  age  variable  r  »  t  -  \, 

t-T  t 
Figure  5.    Piec^wise  linearly  varying  signal  with 


The  best  straight  lint'  A  —  Br  which  can  be  fit- 
ted to  the  signal  from  r  =  0  to  t  =  T  is  that 
which  minimizes 

£T[A-Br-E{t-r)  Vdr. 

Hence  A  and  B  must  satisfy  simultaneously 

t     t*     i  rT 

Eliminating  A,  we  get 
whence  by  partial  integration 



t)  •  t(T  -  r)  dr 

Comparing  this  with  (7),  Appendix  A,  it  will 
be  seen  that  B,  which  is  obviously  a  function  of 
t,  is  the  response  to  the  derivative  of  the  as- 
sumed signal  of  a  network  whose  impulsive 
admittance  is 


f'  fV'f)  0<t<T 


This  is  the  best  weighting  function  for  smooth- 
ing the  derivative  of  the  signal  under  the  as- 
sumed circumstances.  It  is  illustrated  in  Fig- 
ure 6  and  is  generally  referred  to  as  the  "para- 
bolic weighting  function." 

It  should  be  noted  also  that  the  right-hand 
member  of  the  first  of  equations  (3)  is  form- 
ally the  same  as  that  of  equation  (1).  Hence 
the  response  of  the  network  specified  by  (2) 

0  T 

Figure  6.    Best  weighting  function  for  smoothing 
piecewise  linearly  varying  signal. 

and  illustrated  in  Figure  4,  to  the  type  of 
signal  shown  in  Figure  5,  will  correspond  to 
the  value  on  the  best  straight  line  T/2  seconds 
back  from  t,  the  present  time.  This  network  is 
still  the  best  for  smoothing  the  signal,  but  it 
introduces  a  delay  of  one  half  of  the  smooth- 
ing time.  The  delay  may  be  reduced  only  at 
the  price  of  a  reduction  in  smoothing  unless  the 
smoothing  time  is  increased. 


The  autocorrelation  method  with  finite  set- 
tling time  was  first  used  by  G.  R.  Stibitz  in 
numerical  determination  of  the  best  weighting 
function  for  smoothing  the  derivative  of  track- 
ing data  with  typical  tracking  errors.  This 
method  was  also  used  to  determine  the  sensitiv- 
ity of  smoothing  to  departures  of  the  weighting 
function  from  the  best  form. 

The  analysis  is  based  up 


r)    W(r)  dr    t>  T 

for  the  response  to  the  derivative  of  the  error 
time  function  g(t)  of  a  network  whose  impul- 
sive admittance  or  weighting  function  W(t)  is 
identically  zero  for  t  >  T  as  well  as  for  t  <  0. 
Since  measured  tracking  errors  are  generally 
tabulated  only  at  1-second  intervals,  the  in- 
tegral may  be  approximated  by  the  sum 

- 1 



for  integral  values  of  t. 

The  instantaneous  transmitted  power  is  the 




square  of  this  expression,  and  the  average 
transmitted  power  is 

P.v,  =  hill   J.  V  yttt\ 

*  , To 

This  may  be  expressed  in  the  form 

^•.=  LLWm_{t2)-Cm_n-W,_(h)  (o) 


M.a  -  1 


m  —  u 

is  the  autocorrelation  of  the  errors.  Having 
computed  the  autocorrelation,  (5)  may  be  mini- 
mized with  respect  to  the  W's  by  familiar 
methods,  under  the  constraint 

mm  1 


"  -  * 

The  values  of  W  thus  obtained  are  the  speci- 
fication of  the  best  weighting  function."  Equa- 
tion (5)  may  then  be  used  to  determine  the 
sensitivity  of  smoothing  to  departures  of  the 
weighting  function  from  the  best  form. 

Proceeding  along  this  line,  Stibitz  found  that 
the  best  weighting  function  for  typical  actual 
tracking  errors  was  generally  intermediate  to 
the  uniform  and  parabolic  ones  shown  in  Fig- 
ures 4  and  6.  Furthermore,  Stibitz  found 
that  the  difference  in  smoothing  obtained  from 
the  best  weighting  function  on  the  one  hand 
and  from  the  uniform  or  the  parabolic  weight- 
ing function  on  the  other  hand,  is  negligible  in 

The  autocorrelation  method  was  later  for- 
malized by  R.  S.  Phillips  and  P.  R.  Weiss  who 
incorporated  it  into  a  theory  of  prediction.7  A 
brief  exposition  of  this  formulation  is  given 
in  Appendix  B. 


For  the  purposes  of  this  method,  an  ele- 
mentary noise  pulse  is  defined  by  a  time  func- 
tion F0(t)  which  satisfies  the  following  require- 

1.  Identically  zero  when  t  <  0. 

2.  Contains  no  terms  which  increase  expo- 
nentially with  time. 

3.  Power  specLium  N(„>2)  is  the  same  as  that 
of  the  noise. 

The  noise  is  then  regarded  as  the  result  of 
elementary  noise  pulses  started  at  random. 
Alternatively,  it  may  be  regarded  as  the  result 
of  flat  random  noise  passed  through  a  network 
whose  transmission  function  is  S(p)  =  L 
[F„(t)].  As  a  matter  of  fact,  only  S(p)  is 
required  in  the  analysis,  and  this  is  readily  de- 
termined from  the  relation 

|S(uo)l2  =  AF(«*)  , 

together  with  the  condition  that  S(u>)  cor- 
responds to  the  transmission  function  of  a 
minimum-phase  physical  structure  (cf.  Appen- 
dix B). 

The  response  F(t)  to  the  elementary  noise 
pulse  Fu(t)  of  a  network  whose  impulsive  ad- 
mittance is  W(t)  is  given  by  the  operational 

F(()  =  S(p)  ■  W(t) 

in  accordance  with  the  footnote  in  Section  A.5, 
Appendix  A.  The  best  form  for  W(t)  is  there- 
fore that  which  minimizes  the  integral 


[F(0iJ  dt 

under  the  restriction 

when  t0  >  T 

W(t)  dt 



b  The  computations  involved  may  be  considerably  re- 
duced by  noting  the  symmetry  property  proved  in  Sec- 
tion B.2,  Appendix  B. 

This  is  as  much  of  the  elementary  pulse 
method  as  we  shall  need  in  order  to  reconsider 
the  cases  treated  in  Section  10.2.  For  the  treat- 
ment of  more  general  cases  the  method  is  de- 
scribed in  greater  detail  in  Appendix  B. 

The  minimization  of  the  integral  (6)  under 
the  restriction  (7)  reduces  to  a  simple  isoperi- 
metric  problem  in  the  calculus  of  variations,  in 
cases  in  which  S(p)  is  a  polynomial  in  p.  It  is 
essential  first  of  all,  however,  to  note  that  if 
S(p)  is  of  degree  n,  the  integral  (6)  will  con- 
verge only  if  W(t)  is  differentiate  at  least  n 
times.  In  other  words,  W  (t)  must  have  con- 
tinuous derivatives  of  all  orders  up  to  the 
(n-l)th  inclusive,  although  the  nth  derivative 
may  have  finite  discontinuities.  In  particular, 
if  W(t)  is  to  be  zero  outside  of  0  <  t  <  T.  its 




derivatives  of  orders  up  to  the  (n-l)th  inclu- 
sive must  vanish  at  both  t  =  0  and  t  u  T.  These 
2n  boundary  conditions  must  be  imposed  on  the 
solution  of  the  Euler  equation  which  in  this 
case  is 

Wit)  =  A  . 


a  is  a  constant  parameter  which  is  finally  ad- 
justed to  that  the  restriction  (7)  is  satisfied. 

The  first  case  treated  in  Section  10.2  is  one 
in  which  N(„r)  =  1,  whence  Sip)  =  landF(f) 
-  W{t).  The  integral  (ti)  is  a  minimum  under 
the  restriction  (7)  if  Wit)  is  constant  by 
intervals.  The  restriction  (7)  then  requires 
W(t)  to  be  of  the  form  (2). 

The  case  of  first  derivative  smoothing  treated 
in  10.2  is  one  in  which  X  \  *»)  =  «,,2,  whence  S  ip) 
=  p  and  Fit)  =-  Wit).  If  the  integral  (6)  is  to 
converge  at  all,  11/ (t)  must  not  have  discon- 
tinuities of  impulsive  or  higher  type;  in  other 
words,  Wit)  must  be  continuous  through  all 
values  of  t.  The  integral  is  a  minimum  under 
the  restriction  (7)  if  W(t)  is  constant  by 
intervals.  The  restriction  (7)  then  requires 
W(t)  to  be  of  the  form  (4). 

These  results  may  be  generalized  immedi- 
ately. In  whatever  way  the  signal  to  be 
smoothed  may  have  been  derived  from  the 
tracking  data,  let  the  power  spectrum  of  the 
noise  associated  with  it  be  N(m2)  =  a,2".  Then 
Sip)  =p"andF(f)  =  W^  (t).  If  the  integral 

(6)  is  to  converge  at  all,  w'n-n  (t)  must  be  con- 
tinuous through  all  values  of  t.  The  integral  is 
a  minimum  under  the  restriction  (7)  if 
WVin)  it)  is  constant  by  intervals.  The  restric- 
tion (7)  then  requires  W(t)  to  be  of  the  form 


(2n  +  1)  ! 


+  1)\  ft  /       t  \1  ■ 

ssr [tO-jOJ  o<i<T.(8) 

It  may  be  noted  that  the  convergence  re- 
quirements which  arise  in  the  foregoing  dis- 
cussion are  directly  related  to  the  discussion 
and  theorem  in  Section  A.8,  Appendix  A,  with 
respect  to  the  relationship  between  discontinui- 
ties in  the  impulsive  admittance  and  its  deriva- 
tives on  the  one  hand,  and  the  ultimate  cutoff 
characteristic  of  the  transmission  function  on 
the  other  hand.  The  continuity  of  WlM)  (t)  is 
obviously  required  to  make  the  transmission 
fall  off  ultimately  at  the  rate  of  6(n+l)  db  per 
octave  against  the  rise  of  6n  db  per  octave  in 
the  noise  power  spectrum. 

The  integral  (6)  may  also  be  used  to  evalu- 
ate the  relative  advantage  of  the  best  weighting 
function  over  another  weighting  function.  As 
an  example,  consider  the  case  where  the  weight- 
ing function  (2)  is  the  best.  The  value  of  the 
integral  (6)  in  this  case  is  1/T.  If  the  weight- 
ing function  (4)  is  used  against  the  same  noise, 
the  value  of  the  integral  (6)  is  6/5 T.  Hence, 
as  far  as  rms  error  or  standard  deviation  is 
concerned,  the  second  weighting  function  is 
V5/6  or  0.913  as  efficient  as  the  first. 


Chapter  11 


THE  THEORY  of  "smoothing  a  constant"  de- 
veloped in  the  preceding  chapter  will  be 
extended  in  this  chapter  to  the  problem  of 
smoothing  a  polynomial  function  of  time  of  any 
prescribed  degree.  The  extension  is,  however, 
restricted  to  the  case  of  a  flat  noise  spectrum. 
In  addition  to  the  smoothing  problem,  the 
analysis  also  provides  a  way  of  designing  a 
network  which  will  extrapolate  the  polynomial 
a  given  distance  t,  into  the  future.  The  network 
is  so  arranged  that  t,  is  continuously  variable. 
In  addition,  the  degree  of  the  polynomial  can 
readily  be  changed  to  fit  changes  in  the  com- 
plexity of  the  assumed  form  of  the  data,  apart 
from  noise. 

It  is  clear  that  these  results  amount,  in  a 
certain  sense,  to  an  alternative  to  Wiener's 
method  for  the  design  of  prediction  circuits  for 
general  time  series.  Thus,  to  predict  a  time 
series  of  any  given  complexity  we  would  need 
only  to  begin  with  a  polynomial  of  sufficiently 
high  degree  to  fit  the  observed  data,  and  extra- 
polate. Aside  from  the  restriction  to  a  flat 
noise  spectrum,  perhaps  the  most  obvious  dif- 
ference from  Wiener's  method  is  the  fact  that 
the  settling  time  restriction  limits  the  data 
upon  which  the  prediction  rests  to  a  finite  in- 
terval in  the  past.  To  advance  such  a  prediction 
theory  seriously,  however,  it  would  be  neces- 
sary to  go  much  farther  into  the  way  in  which 
the  degree  of  the  polynomial  is  established  and 
the  justification  for  assuming  that  the  extra- 
polated value  represents  a  probable  future 
value  for  the  function.' 

This  general  discussion  will  not  be  under- 
taken here.  Since  prediction  with  high  degree 
polynomials  will  certainly  be  sensitive  to  minor 
irregularities  in  the  data,  tracking  errors 
would  necessarily  limit  the  application  of  the 
method  in  any  case.  If  we  confine  ourselves  to 
reasonably  low  degree  polynomials,  however, 

»  As  an  example  of  possible  difficulties  we  may  notice 
the  fact  that  two  polynomials  of  different  degree  which 
approximate  a  given  function  as  closely  as  possible,  in 
a  least  squares  sense,  in  a  prescribed  interval  fre- 
quently differ  radically  outside  that  interval. 

the  method  is  useful.  An  example  is  furnished 
by  the  prediction  of  airplane  position,  in  rec- 
tangular coordinates,  by  quadratic  functions  of 
time.  Here  the  square  terms  represent  the 
effects  of  accelerations  in  the  various  coordi- 
nates. We  can  defend  the  inclusion  of  such 
terms  on  the  ground  that  it  is  plausible  to  as- 
sume that  an  airplane  may  experience  constant 
accelerations,  due  to  turns,  the  force  of  gravity, 
etc.,  for  considerable  periods  of  time.  The 
linear  term  represents  plane  velocity  and  needs 
no  defense.  The  constant  term,  of  course,  gives 
the  plane  position  at  some  reference  time.  In- 
cluding it  in  the  smoothing  operation  is  equiva- 
lent to  introducing  "present-position"  smooth- 
ing of  the  sort  suggested  by  the  broken  lines 
in  Figure  1  of  Chapter  7.h 

Aside  from  its  direct  interest  as  a  possible 
prediction  method,  the  analysis  in  this  chapter 
is  also  of  indirect  interest  for  the  additional 
light  it  sheds  on  the  effect  of  the  noise  spec- 
trum on  smoothing  functions.  It  turns  out  that 
smoothing  a  power  of  time,  with  a  flat  noise 
spectrum,  is  equivalent  to  smoothing  a  constant 
with  a  somewhat  different  noise  spectrum. 
Thus  the  smoothing  functions  developed  for 
polynomials  are  also  useful  as  special  cases  of 
smoothing  functions  applicable  to  constants. 


Let  A  be  any  past  value  of  time  and  let  t  be 
the  present  value.  If  the  data  is  fitted  with  a 
smooth  curve  E  (k) ,  the  predicted  value  may  be 
taken  as  E(t  +  tf).  The  procedure  of  fitting  is 
the  familiar  one  of  minimizing  the  integral 

[  E(\)  -  E(\)  ]J  W,(t,\)  rfX 

b  In  the  circuit  of  Figure  1,  Chapter  7,  however,  the 
smoothing  network  would  produce  a  lag  in  the  present- 
position  data  delivered  to  the  prediction  circuit,  and 
this  lag  would,  of  course,  mean  some  error  in  follow- 
ing a  moving  target.  In  the  method  described  in  this 
chapter  such  lags  are  automatically  compensated  for 
by  adjustments  in  the  coefficients  of  the  other  terms  of 
the  polynomial. 




with  respect  to  disposable  parameters  in  E(k) 
and  a  prescribed  weighting  function  Wn(t,k). 
The  lower  limit  of  the  integral  is  indicated  as 
—  oo  in  compliance  with  the  physical  impossi- 
bility of  discriminating  between  relevant  and 
irrelevant  data,  with  fixed  linear  networks,  ex- 
cept on  the  basis  of  age.  The  burden  of  dis- 
crimination must  be  relegated  to  the  weighting 
function  which  must  be  a  function  only  of  the 
age  t  -  A.  Under  the  ideal  restriction  that 
Wn(t  —  A)  is  identically  zero  when  t  -  A  >  T  or 
A  <  t  —  T,  the  indicated  lower  limit  of  the  in- 
tegral is  purely  nominal. 

As  in  Section  10.2,  it  is  convenient  to  con- 
duct the  analysis  in  terms  of  the  age  variable 
t  =  t  —  A  introduced  there.  If 

In  terms  of  the  forward  time  A,  (2)  and  (3) 
reduce  to 

F(r)  =  F(r)  =  K{\) 

the  integral  to  be  mir 
in  the  form 

I  may  be  expressed 

|>»  -  F(t)\2  ir„(r)  i/t  . 


In  accordance  with  the  discussion  of  quasi- 
distortionless  transmission  networks  in  Section 
A. 10,  Appendix  A,  the  smooth  curve  K (a) 
should  be  a  polynomial  in  A.  Hence  F(t) 
should  be  a  polynomial  in  r.  It  will  be  more 
convenient,  however,  to  express  F(t)  formally 
as  a  linear  combination  of  polynomials  in  t 
which  may  be  orthogonalized.  Hence,  let 

F{r)  =  \\+\'i-Gt(T)+\\-(,\(T)+  -  +IV^'„<T) 


where  G,„(t)  is  an  mth  degree  polynomial  in  t. 
Let  Wu(t)  be  normalized  in  the  sense  that 

f  W0(r)  dr  =  1 

and  the  Gm(r)  be  orthogonalized  with  respect 

to  the  weighting  function  W„(t)  in  the  sense 

/    G,(t)  Gm(r)  W0(t)  dr  =    0  if  /  *  m 

Jo                                                   »  f, 

=  j  -     if  /  =  m 

(G0  =  1,  Ao  =  1). 

The  integral  (1)  is  then  a  minimum  with 
respect  to  the  Vm's  in  (2)  if 

Vm  =  km  jf 00  F(T)  ■  GJt)  ■  H'„(t)  <tr  .  (3) 

E(\)  =  Yn(t)  +  Wit)  ■  Gx(t  -  A)  +  V,(t)  ■  Gt(t  -  A) 

+  -  +  Vn(t)  -Gn(t-\)  (4) 


!'„,(/)  =  km  f    E(\)  -Gm(t-\).  W0(t-\)dk.(5) 

Expression  (5)  identifies  the  Vm(t)  as  the 
responses  to  E(k)  of  fixed  linear  networks 
whose  impulsive  admittances  are 

ir,„(r)  =  k„,Gm(r)  :  W0(r)  .  (6) 

By  (4),  the  predicted  value  may  be  obtained 
by  a  linear  combination  of  the  responses  of 

these  networks,  viz., 

Mi  +  U)  =  Y»(t)  +  Gii-t,)  ■  \\(f)  +  G,(-if)  -Vtit) 
+  ■■■  +  Gn(-if)  ■  Vn(t)  .  (7) 

A  schematic  representation  of  an  nth  order 
smoothing  and  prediction  circuit,  based  on  (7), 
is  shown  in  Figure  1,  where  the  G„,  (  —  t,)  are 
represented  as  potentiometer  factors  dependent 
on  the  time  of  flight. 



I  1  i— Wv- 

-  Y,(P)  -AMAv-i 
U  1  G.C-t,) 




Gn(-V  4- 

Figure  1.    Schematic  representation  of  nth  order 
smoothing  and  prediction  circuit. 

Alternatively,  (7)  may  be  written 

K(t  +  t/)  =  E(t)  +       -  //)  -  G,(0)]  •  V,(0  +  ••• 
+  [Gn(  -  tf)  -  G„(0)]  •  Vn(t)  (8) 

where  E(t)  is  then  replaced  by  Eit)  when 
position  data  smoothing  is  to  be  omitted. 

It  is  not  necessary  that  the  G,(r)  polyno- 
mials be  orthogonal.  However,  the  circuit 
switching  required  to  reduce  or  increase  the 
order  of  the  prediction  is  simplest  when  the 
G„,(t)  polynomials  are  orthogonal.  Orthogonal 
polynomials  corresponding  to  any 




weighting  function  W0(T)  are  readily  derived 
by  well-known  methods,. 

The  weighting  function  W0(r)  may  be  deter- 
mined by  either  of  the  methods  described  in 
Appendix  B  as  the  best  weighting  function  for 
smoothing  position  data,  under  prescribed 
tracking  error  characteristics.  Then  the  best 
impulsive  admittances  Wm(T)  for  a  smoothing 
and  prediction  circuit,  are  prescribed  by  (6). 

The  relationship  (6)  shows  that  if  the  pre- 
scribed weighting  function  W0(T)  satisfies  the 
formal  requirements  for  physical  realizability, 
so  will  all  of  the  impulsive  admittances  Wm(r). 
Of  the  standard  sets  of  orthogonal  polynomials 
those  of  Laguerre  appear  to  be  the  best  adapted 
to  physical  realization.  The  Laguerre  polyno- 
mials L„(a>  (T)  are  orthogonal  in  0  <  t  <  oo 
with  the  weighting  function  rae~\  However, 
such  a  weighting  function  is,  in  general,  very 
unsatisfactory  from  the  practical  point  of  view 
of  settling  characteristics. 

It  is  possible  of  course  to  approximate  any 
prescribed  weighting  function  W0  (t)  as  closely 
as  may  be  desired  in  a  physically  realizable 
form,  derive  a  set  of  orthogonal  polynomials 
based  on  the  approximate  form,  and  determine 
the  impulsive  admittances  Wm(T)  from  (6). 
However,  such  a  procedure  leads  to  complexities 
of  network  configuration  which  increase  very 
rapidly  withrthe  index  to.  This  increasing  com- 
plexity is  hardly  justifiable  in  practice. 

From  the  foregoing  considerations,  it  ap- 
pears that  the  most  practical  procedure  is  to 
derive  all  of  the  impulsive  admittances  Wm(T) 
without  regard  to  physical  realizability,  ap- 
proximate them  independently  in  physically 
realizable  forms  of  independently  prescribed 
complexities,  and  modify  or  redetermine  the 
potentiometer  factors  in  accordance  with  the 
discussion  in  Section  A.10,  Appendix  A. 


The  impulsive  admittances  defined  by  (6) 
for  m  >  0  may  not  be  regarded  as  weighting 
functions  even  though  the  response  of  the  cor- 
responding networks  to  E  (a)  is,  by  (5) 

Vm  (0  -  f  K(t  -r)  •  Wm  (t)  'fir, 

because,  with  the  exception  of  We(r),  the 
Wm(T),  as  will  presently  be  seen,  cannot  be  nor- 
malized. The  term  weighting  function  is  re- 
served for  the  functions  defined  by  (11)  below. 

Since  rr  is  a  linear  combination  of  the  G,  (t) 
where  s  =  0,  1,  •  •  •  ,  r,  it  is  obvious  from  (6) 


/     ?WUl)  dr  =  0 

when  r  <  m  . 
In  particular 

/     WJr)  dr  =  0 

when  m  >  0  . 

Since  the  transmission  function  Ym(p)  of  a 
network  is  the  Laplace  transform  of  its  im- 
pulsive admittance  (see  Section  A.3) ,  we  have 

Wm(r)  e~'*  dr 

y  ( -  p)r  r 


The  first  m  terms  in  this  series  vanish.  Hence 
Ym  (p)  will  be  of  the  form 

Tm(p)  =  r"y-(p)  (10) 

where  ym  (0)  ^=0.  This  permits  us  to  regard  the 
network  whose  impulsive  admittance  is  Wm(T) 
as  an  instantaneous  mth  order  differentiator, 
corresponding  to  the  factor  p*  in  (10),  in 
tandem  with  a  purely  smoothing  network 
whose  transmission  function  is  ym(p). 

It  is  convenient  to  associate  a  weighting 
function  wm  (T)  with  the  purely  smoothing  net- 
work whose  transmission  function  is  ym(p) . 
Dividing  (10)  through  by  pm  the  resulting 
operational  equation  may  be  interpreted  (see 
Section  A.5)  to  mean  that  the  weighting  func- 
tion wm(T)  is  the  m-fold  integral  of  the  im- 
pulsive admittance  Wm(T)  between  the  limits 
0  and  t.  This  is  expressed  by 

o  Jo    WmiT)  '{dT)m-  (11> 

By  a  relationship  similar  to  (9)  between  ym(p) 
and  wHl  (r) ,  it  follows  from  ym  (0)  ^  0  that 

u>„(r)  dr  *  0  . 




Hence  the  wm(T)  may  be  normalized  in  the  it  is  readily  determined  that 
sense  that 

jT   wm  (t)  dr  =  1 

jp-    /    [G«(t)]»  W.(t)  dr 
"      ^/ o 

(2m)!  (2m  +  1)!  ' 

for  all  values  of  to.  However,  this  may  he  done 
in  general  only  if  the  G„(t)  polynomials,  are    Then,  by  (6) 
not  normalized  in  the  sense  that  km  =  1  i&c  any 

value  of  to  >  0.  It  is  in  fact  readily  shown  that    Wm(r)  =  (-)m  .(2rw  +  U !  pm  (2T  -  1)   0  £  r  :£  1 

the  coefficient  of  i*  in  G,„(t)  must  be  the  same 
as  that  of  rm  in  cT. 




=  0     r  >  1  . 

Substituting  this  in  turn  into  (11)  and  making 
use  of  Rodrigues'  formula 

The  Legendre  polynomials  P„t  (x)  are  orthog- 
onal with  respect  to  the  range--  1  <  x  <  1  and 
uniform  weighting.  In  other  words,  the  poly-  or 
nomials  P„(2t  —  1)  are  orthogonal  with  respect 
to  the  range  0  <  t  <  co  and  the  weighting  func- 

( —  \m  dm 

p-<*>  "  SOT  (1  "  *>" 

p-(2t  -  1}  -  S^r  £ M1  -  w 

W0(r)  =  1      when  0  <.  r  <,  1 
=  0      when  t  >  1  . 

It  is  known  from  Section  10.4  that  this  form 
for  the  weighting  function  W0(t)  is  best  in 
case  the  tracking  errors  are  flat  random  noise. 
In  the  integral  (1)  to  be  minimized,  the  Gm(r) 
polynomials  should  then  be 

The  first  few  of  these  are  tabulated  below. 

it  is  finally  found  that 
(2m  -I-  1)! 

=  0     T  >  1. 

[t(1  -  t)]«       0  £  T  £  1 


By  a  relationship  of  the  form  of  (9)  the 
transmission  functions  ym(p)  corresponding  to 
the  weighting  functions  wm(T)  may  be  deter- 
mined. The  first  three  are 

1  -  e-* 





2  i_I  +  I1 
12     2  2 

3  —  -  +  -  -  - 

120      10^  4  6 


Vt(P)  -  Jt  l(P  -  2)  +  (p  +  2)9-'] 
V*(P)  -  p  1(P»  "  6p  +  12)  -  (pi  +  6p  +  m-'\. 
These  may  be  written  in  the  form 

Vm(p)   -  QmM   •  rM 



With  the  help  of  the  formula 

j  [Pm(z))*d* 

2m  +  1 

0  The  unit  of  time  being  equal  to  the  nominal  smooth- 
ing time. 



sin  x      /  J\ 

-—  V  -  V 

X  cos  z 

16  0  ~  xt)  SEj *  ~  31  006  *  (14) 



or  in  the  infinite  power-series  form 

„r,  (»  +  «i 

Vt(p)  =  60  £ 

■  -0 

(n  +  l)(n  +  2) 
(n  +  5)! 

(-P)V  (15) 

Methods  for  obtaining  physically  realizable  ap- 
proximations to  the  weighting  functions  wm(r) 
or  impulsive  admittances  Wm(T),  based  upon 
the  Q  functions  (14)  and  the  series  expansions 
(15)  are  described  in  Chapter  12. 


Chapter  12 


This  chapter  will  be  devoted  to  a  brief  re- 
view of  some  of  the  methods  and  techniques 
which  have  been  used  in  the  physical  realiza- 
tion of  data-smoothing  or  weighting  functions. 
The  first  two  sections  will  be  devoted  to  meth- 
ods for  determining  physically  realizable  ap- 
proximations to  a  desired  weighting  function. 
The  third  section  takes  up  the  use  of  feedback 
amplifiers  and  servomechanisms  in  order  to 
avoid  the  use  of  coils  of  generally  fantastic 
sizes.  The  final  section  takes  up  the  design  of 
resistance-  capacitance  networks. 

Methods  of  deriving  physically  realizable  ap- 
proximations of  best  weighting  functions  may 
be  divided  into  two  classes,  which  may  be 
called,  for  convenience,  /-methods  and  p-meth- 
ods.  The  i-methods  are  those  in  which  a  pre- 
scribed best  weighting  function  W(t)  is 
approximated  directly  by  a  function  W„(t)  of 
realizable  form,  viz.,  a  sum  of  decaying  expo- 
nential terms  and  exponentially  decaying  sinu- 
soidal terms.  However,  the  <-methods  are  most 
useful  when  the  approximation  is  restricted  to 
a  sum  only  of  exponential  terms.  According  to 
the  discussion  in  Section  A.9,  Appendix  A,  such 
a  restriction  corresponds  physically  to  passive 
RC  transmission  networks.  A  <-method  was 
used  by  Phillips  and  Weiss  in  the  reference 
quoted  in  Section  10.3  to  obtain  an  approxi- 
mation with  one  decaying  exponential  term  and 
one  exponentially  decaying  sinusoidal  term. 
However,  this  method  rapidly  becomes  un- 
wieldy as  the  number  of  terms  is  increased. 

The  p-methods  are  those  in  which  the  ap- 
proximation is  derived  indirectly  from  the 
transmission  function  Y(p)  corresponding  to 
W(t).  A  rational  function  Ya(p)  approximat- 
ing Y(p)  is  first  determined.  If  it  is  realizable, 
and  it  usually  is,  then  Wa(t)  =  L^lYaip)].  In 
general,  Ytt(p)  will  have  complex  poles  and, 
therefore,  Wa(t)  will  have  exponentially  decay- 
ing sinusoids  as  well  as  simple  exponentials. 
This  gives  the  p-methods  a  considerable  advan- 
tage over  the  f-methods  in  more  efficient  use  of 
network  elements.  The  fact  that  this  generally 
calls  for  impractical  element  values  in  passive 

RLC  networks  is  not  serious.  As  shown  in  Sec- 
tion 12.3,  the  use  of  coils  may  be  avoided 
entirely  by  the  use  of  feedback  amplifiers. 

121  ^-METHODS 
To  describe  the  ^-method,"  let 

Wa(t)  =  Aie-i\  +  A*—*  +  ■  ■  ■  +  Aen-.t  (1) 

where  the  a's  are  prescribed  and  the  A's  are  to 
be  determined.  Two  considerations  are  involved 
in  the  determination  of  the  A's.  The  first  con- 
sideration is  based  on  the  relationship  between 
the  continuity  conditions  at  t  =  0  and  the  ulti- 
mate slope  of  the  loss  characteristic  as  ex- 
pressed in  the  theorem  in  Section  A.8.  Accord- 
ingly, a  number  of  relations  of  the  type 

Ai  +  A-i  +  ■  ■ .  -f-  An  =  0 
a\  Ax  +  a,  At  +  ...  +  a„  A„  =0  (2) 

«'  A ,  +  al  A2  +  .  .  .  +  a„r  An  =  0    r  <  n  -  1 

must  be  satisfied.  This  leaves  n  -  r  -  1  of  the 
A's  for  the  second  consideration. 

The  second  consideration  concerns  the  man- 
ner in  which  the  approximation  in  the  range 
t  >  0  is  to  be  made.  The  approximation  may, 
for  example,  be  required  to  pass  through 
n  -  r  -  1  points  on  W(t)  or,  the  first  n  -  r  -  1 
moments  of  the  approximation  may  be  required 
to  be  equal  to  the  corresponding  moments  of 
W(t).  The  latter  is  expressed  by  relations  of 
the  type 

Ai     A2  An  1  /*c° 

-+-+■■■+-  =  —77,  /    W(t)  /—  dt 

s  -  1,  2,  •  •  • ,  n  -  r  -  1  (3) 

Foster's  investigations  were  concerned  only 
with  the  parabolic  weighting  function  (4) 
Chapter  10,  so  that  only  the  first  of  (2)  was 
involved.  Numerical  studies  led  to  the  belief 
that,  with  a  given  number  of  a's,  the  best  ap- 
proximation was  to  be  had  from  the  case  in 

■  The  i-method  is  principally  due  to  R.  M.  Foster. 





which  all  of  the  a's  are  equal.  Hence  the  natural 
center  of  attention  was  the  special  form 

Wa(t)  =  (Ait  +  Ad*  +  •  ■  •  +  An-if -»)«-*.  (4) 

At  large  values  of  t  this  expression  reduces  ap- 
proximately to  the  last  term,  and  if  it  is  as- 
sumed that  An.i  =  1,  the  settling  condition  fixes 
a  to  at  least  a  first  approximation.  The  rest  of 
the  work  of  approximating  the  parabola  is  then 
equivalent  to  a  problem  in  polynomial  approxi- 
mation. Once  the  A's  are  determined,  a  better 
value  of  a  can  be  found  from  the  settling  con- 
dition, and  the  process  gone  through  again. 

If  the  a's  are  only  approximately  equal,  the 
approximation  will  still  behave  approximately 
like  (4)  with  an  average  value  used  for  a.  The 
difficulty  with  equal  or  nearly  equal  a's  is  that 
it  leads  to  networks  with  extreme  element 
values.  In  order  to  secure  satisfactory  element 
values,  it  is  generally  necessary  to  depart  sub- 
stantially from  the  condition  of  equal  a's.  This 
results  in  some,  but  not  a  large,  loss  of  effi- 
ciency in  approximating  the  parabola.  Foster 
recommends  that  the  a's  be  chosen  as  a  geo- 
metric series,  with  their  geometric  mean  more 
or  less  around  the  equivalent  point  for  equal 
a's.  With  four  a's  he  suggests  that  the  constant 
ratio  in  the  series  may  be  3:2,  whereas  with 
only  two  a's  the  ratio  should  be  raised  to  2:1. 
These  are,  however,  only  rough  values  and 
obviously  depend  on  individual  opinion  of  what 
constitutes  an  unreasonable  element  value. 

As  a  matter  of  experience,  it  turns  out  that 
the  characteristic  first  obtained  usually  has  a 
rather  long  and  slowly  decaying  tail,  as  shown 
in  Figure  1.  This,  of  course,  is  equivalent  to  a 

Figure  1.   Approximation  to  parabolic  weighting 
function,  showing  poor  settling  characteristic. 

correspondingly  long  "settling  time,"  or  time 
before  a  useful  prediction  can  be  made.  In 
practice,  therefore,  after  the  preliminary 
design  has  been  found,  adjustments  are  made 
to  bring  the  tail  of  the  curve  under  control, 

partly  by  modifying  the  values  of  the  A's 
slightly,  and  partly  by  contracting  the  time 
scale  to  bring  the  part  of  the  tail  which  remains 
appreciable  within  the  allowable  settling  time 
limits.  This  leads  to  the  somewhat  lopsided 
match  to  the  parabola  shown  in  Figure  2. 

Figure  2.  Approximation  to  parabolic  weighting 
function,  showing  better  settling  characteristic. 

A  method  of  bringing  the  tail  of  the  curve 
under  control"  is  to  minimize  the  expression 


/{Wa(t)]2d!  =  2£  C,„A,A, 



ai  +  am 

under  the  restrictions  (2)  and  all  but  the  last 

of  (3). 

The  f-methocj  used  by  Phillips  and  Weiss  is 
based  on  a  3-term  approximation  of  the  form 
(1)  in  which  one  a  is  real  while  the  other  two 
may  be  conjugate  complex.  The  a's  are  not 
prescribed,  so  that  there  are  six  parameters  to 
be  determined.  Four  restrictions  are  imposed, 
viz.,  the  first  of  (2),  the  first  of  (3),  a  restric- 
tion on  the  value  of  the  tail  area,  viz., 


W.(t)dt  =  ZAL£_L, 
't  '-1  at 

and  the  cross-over  condition 

Wa(T)  =  0. 

Finally,  the  transmitted  noise  power,  which, 
under  the  assumption  of  flat  random  noise  as- 
sociated with  the  position  data,  takes  the  form 
(see  Section  10.4) 


[W.(t))t  di 

is  minimized  with  respect  to  the  two  remaining 
parameters  by  numerical  methods. 

"  Used  by  R.  F.  Wick. 


—  — 






Three  p-methods  have  been  used.  These  will 
be  described  in  chronological  order. 

The  first  p-method  is  one  which  was  used  by 
R.  L.  Dietzold  in  exploiting  the  use  of  feedback 
amplifiers  to  secure  the  advantages  of  approxi- 
mations with  complex  exponentials.  The  trans- 
mission function  Y(p)  corresponding  to  the 
best  weighting  function  W(t)  is  first  formu- 
lated. The  loss  characteristic,  -20  log,„  \  Y(im)  |, 
is  next  computed  and  plotted  against  the  fre- 
quency on  a  logarithmic  scale.  Then  standard 
equalizer  design  techniques  are  employed  to  ap- 
proximate the  loss  characteristic,  keeping  in 
mind  that  the  transmission  loss  in  the  feedback 
network  of  a  feedback  amplifier  becomes  a 
transmission  gain  for  the  circuit  as  a  whole 

(14)  of  Chapter  11,  we  get 

J/o  (p)  = 

Vi(p)  = 

2  +  p 


12  +  6p  +  p» 


The  second  p-method  is  merely  a  more  com- 
plete analytic  formulation  of  the  first,  thereby 
avoiding  the  necessity  for  employing  equalizer 
design  techniques.  It  depends  upon  the  possi- 
bility of  expressing  the  transmission  function 
corresponding  to  the  best  weighting  function, 
in  the  form  of  equation  (13)  Chapter  11,  which 
is  associated  with  the  symmetry  of  the  weight- 
ing function,  as  shown  in  Section  A.7.  The 
method  is  based  upon  the  determination  of  the 
envelope  of  the  Q-function.  The  Q-function  is 
first  differentiated  in  order  to  obtain  the 
equation  which  determines  the  values  of  « 
at  which  the  maxima  and  minima  occur.  This 
transcendental  equation  is  not  solved  but  is 
used  to  eliminate  the  trigonometric  functions 
in  the  expression  of  the  Q-function.  The  result- 
ing expression,  which  is  an  irrational  function 
of  «o2,  is  then  squared  in  order  to  make  it  a 
rational    function    of   »>.    The  substitution 
p*  =  -  o.2  is  made  and  the  expression  is  then  re- 
solved into  two  factors  of  which  one  contains 
all  the  poles  with  negative  real  parts  while  the 
other  contains  all  the  poles  with  positive  real 
parts,  the  two  factors  being  conjugate  complex 
when  p  =  to>.  The  first  factor  is  then  taken  as  an 
approximation   of  the  desired  transmission 
function.  Applying  the  method  to  the  desired 
transmission  functions  defined  by  (13)  and 

120  +  60p  +  12p*  +  p»  • 
This  last  is  the  basis  for  the  design  of  a  posi- 
tion and  rate  smoothing  circuit  for  a  proposed 
computor  for  controlling  bombers  from  the 
ground."11  This  design  is  described  briefly 
in  Chapter  13. 

The  third  p-method  is  based  upon  the  ascend- 
ing power-series  expansion  of  the  transmission 
function  corresponding  to  the  best  weighting 
function.  Examples  of  such  power  series  are 
given  by  (15)  of  Chapter  11.  The  method  of 
approximation  is  one  which  is  credited  to  Pade 
in  0.  Perron's  "Kettenbruchen.""  If  the  discus- 
sion in  Section  A.8  is  referred  to,  it  will  be  seen 
to  be  also  a  method  of  moments. 

The  method  consists  in  determining  the  co- 
efficients in  a  rational  function  of  the  form 

1  +  QiP  +  Qip»  +  j-  ampm 

1  +  blP  +  6,p»  +  .  .  .  +  6„p»  w 
so  that  the  ascending  power-series  expansion 
of  the  rational  function  will  agree  with  that  of 
the  best  transmission  function,  term  for  term 
up  to  and  including  pm**.  If  the  series  for  the 
best  transmission  function  is 

1  +  cp  +  c,p*  +  . . .  +  c«+„p»+"  +  . . .  (8) 
the  equations  which  determine  the  coefficients  in 
(7)  are  obtained  by  equating  coefficients  of 
corresponding  powers  of  p,  up  to  and  including 
the  (m  +  n)th,  in 

(1  +  blV  + 


+  fe.p")  (l  +  c,p  +  •  •  • 


1  +  <HP  +  •  •  •  +  anpm. 
The  last  n  equations  will  be  homogeneous  in 
the  6's  and  c's. 

It  has  been  expedient  in  some  cases  to  omit 
the  last  few  of  the  (m+n)  equations  in  order 
to  have  some  control  over  the  number  of  real 
roots  and  poles  and  the  number  of  conjugate 
pairs  of  complex  roots  and  poles  in  the  result- 
ing rational  function. 

In  the  assumed  rational  expression  (7)  the 




difference  n  —  m  "Should  be  chosen  so  that  the 
ultimate  slope  of  the  loss  characteristic  will  be 
the  same  as  for  the  best  transmission  function. 
According  to  the  theorem  in  Section  A.8,  if 
W(t)  behaves  like  if  as  t->0,  we  should  take 
n  —  m  =  r  +  1.  As  a  matter  of  experience  the 
rational  expression  has  invariably  turned  out 
to  be  physically  realizable  whenever  this  "rule" 
was  followed.  Frequently,  however,  the  rational 
expression  has  turned  out  to  be  physically 
realizable  under  small  departures  from  the 

Examples  of  this  method  are  given  in  Chap- 
ter 13. 


In  this  section  we  shall  describe  the  use  of 
feedback  amplifiers  and  servomechanisms  to 
obtain  desired  transmission  functions.  For  com- 
plete discussions  of  the  most  recent  technical 
advances  in  the  analysis  and  design  of  feedback 
amplifiers  and  servomechanisms  the  reader 
should  consult  some  of  the  modern  literature 
on  these  subjects.2  3-51sl61T 

Let  us  assume  that  we  have  two  networks 
whose  transmission  functions  are  Yt(p)  and 
Y2(p),  respectively,  as  shown  in  Figure  3.  For 

Y2(P)  ^>V(t) 

I£(t)  =  Y2(p)-V(t) 

itic  representation  of  networks 
ick  circuit  application. 

a  signal  E(t)  applied  to  the  first  network  the 
short-circuit  output  current  is  /,(£)  =  Yx(p)' 
E(t).  For  a  signal  V(t)  applied  to  the  second 
network  the  short-circuit  output  current  is 



Figure  4.    First  step  in  combining  networks. 

hit)  =  7,  (p) -7(0- With  the  networks  sharing 
a  common  short-circuiting  conductor  as  shown 
in  Figure  4,  the  current  through  the  conductor 
is  7,  -I-  I2.  If  the  source  which  develops  the  volt- 

age V(t)  across  the  input  terminals  of  the 
second  network  were  in  fact  under  the  control 
of  the  current  through  the  conductor,  as  shown 
schematically  in  Figure  5,  in  such  a  manner 

Figure  5.    Output  voitage  controlled  by  short- 
circuit  current  across  intermediate  terminals. 

that  it  had  to  develop  that  voltage  V(t)  which 
reduces  the  current  in  the  conductor  to  zero, 

Yxip)    E(t)  +  Yt(p)  ■  V(t)  =  0  . 

Hence,  the  transmission  function  (now  a  volt- 
age-voltage ratio)  of  the  arrangement  shown 
in  Figure  5  must  be 


Y(p)  =  - 


Y,(p)  ' 

This  relationship  provides  a  method  of  ob- 
taining transmission  functions  with  complex 
poles  without  the  requirement  of  coils.0  The 
complex  roots  of  Y(p),  must  be  assigned  to  the 
numerator  of  Y1  (p) ,  and  the  complex  poles  of 
Y(p)  to  the  numerator  of  Yt(p).  Aside  from 
this,  the  other  roots  and  poles  of  Y(p)  may  be 
assigned  in  any  way  which  is  favorable  to  good 
design  practice.  Redundant  factors  may  be  in- 
troduced if  they  are  desirable,  as  is  done  in  the 
examples  described  in  Sections  13.1.5  and  13.3. 

The  source  of  the  voltage  V(t)  in  Figure  5 
does  not' have  to  be  controlled  by  the  current 
through  the  short-circuiting  conductor.  Since 
the  current  through  any  short  circuit  must  be 
zero  if  the  voltage  across  the  short-circuited 
terminals  is  zero  before  the  short  circuit  is  con- 
nected across  them,  the  source  of  the  voltage 
V(t)  may  just  as  well  be  controlled  by  the 
open-circuit  voltage,  as  shown  in  Figure  6.  It 
is  clear  that  the  source  of  the  voltage  V(t)  is 
ideally  an  infinite  gain  amplifier.  It  is  not  nec- 
essary, however,  that  the  amplifier  have  ideally 
unilateral  transmission  and  infinite  input  and 
output  impedances,  since  departures  from  these 
ideal  characteristics  may  be  compensated  for  in 
the  design  of  the  feedback  network. 

The  simple  result  expressed  by  (9)  may  be 
readily  modified  to  take  account  of  the  finite 

0  This  observation  was  first  made  by  R.  L.  Dietzold. 




gain  of  a  physical  amplifier.  The  modification 
will  be  expressed  as  an  extra  factor  which 
corresponds  to  the  "rf  effect"  or  "nfi  error"lie 
commonly  encountered  in  the  theory  and  design 
of  feedback  amplifiers. 



Figure  6.    Output  voltage  controlled  by  open- 
circuit  voltage  across  intermediate  terminals. 

The  exact  transmission  function  of  the  cir- 
cuit shown  in  Figure  6  is  most  simply  ex- 
pressed in  terms  of  the  following  quantities: 
=  current  through  a  short  across  ter- 
minal-pair No.  3,  per  unit  emf  applied 
across  terminal-pair  No.  t. 
Y2  (p)  =  current  through  a  short  across  ter- 
minal-pair No.  3,  per  unit  emf  applied 
across  terminal-pair  No.  2. 
Z2  (p)  =  impedance  between  terminal-pair  No. 

2,  with  terminal-pair  No.  3  shorted. 
Z3(p)  =  impedance  between  terminal-pair  No. 

3,  with  amplifier  dead,  terminal-pair 
No.  1  shorted,  and  terminal-pair  No.  2 

G(p)  =transadmittance  of  amplifier. 

i  - 



The  quantity  GYJZ„Z3  is  the  of  the  circuit. 
The  quantity  Y,Y,Z„Z3  to  which  Y  reduces 
when  G  =  0  represents  the  direct  transmission 
of  the  circuit. 

The  active  impedance  across  terminal-pair 
No.  2  is 




1  —  Gi  2Z2Z3 

ziP  =  zt{\  +  r|?,z,) .  (12) 

ZtP  is  the  passive  impedance  across  terminal- 
pair  No.  2.  It  differs  from  Z„  in  that  terminal- 
pair  No.  3  is  open. 

The  exact  expression  (10)  of  the  transmis- 
sion function  is  useful  chiefly  as  a  check  on  the 
simpler  but  approximate  expression  (9).  It  is 
in  general  quite  practicable  to  make  the  trans- 
admittance  or  transconductance  G  of  the  am- 
plifier large  enough  so  that  the  n0  effect  may  be 

In  accordance  with  the  sense  in  which  the 
term  "servomechanism"  is  used  by  MacColl,4 
a  feedback  circuit,  such  as  that  shown  in  Fig- 
ure 6,  is  a  servomechanism  —  more  specifically, 
an  electronic  servomechanism  —  since  it  oper- 
ates on  the  ideal  principle  of  maintaining  zero 
voltage  across  the  terminal-pair  No.  3.  An 
electromechanical  counterpart  of  the  circuit 
shown  in  Figure  6  is  shown  in  Figure  7.  These 


:  7.    Electromechanical  counterpart  of  feed-' 
back  amplifier  circuit  resulting  in  servomechaniMti. 

circuits  assume  that  the  signal  E(t)  is  a  modu- 
lated d-c  carrier. 

If  the  signal  is  a  modulated  a-c  carrier, 
"shaping"  cannot  be  done  conveniently  by  elec- 
trical networks.  The  difficulty  may  be  avoided 
by  various  special  devices.  An  example  is  de- 
scribed and  illustrated  in  Section  13.4. 



In  this  section  we  will  describe  and  illustrate 
two  general  methods  of  designing  RC  networks. 
The  first  is  most  useful  when  the  transmission 
function  is  finite  and  not  zero  at  zero  fre- 
quency; the  second,  when  the  transmission 




function  is  zero  at  zero  frequency.  The  case  of  a 
transmission  function  with  a  pole  at  zero  fre- 
quency will  not  be  considered,  since  it  is  cov- 
ered by  the  methods  ,  described  in  the  preceding 
section,  in  conjunction  with  the  methods  de- 
scribed below. 


Op  +  QiP  +  •••  +  Q.+iP"*1 

(flo>0)  (13) 

1  +  6iP  +  •  ■  •  +  6»p" 

with  simple,  real,  negative  poles.  Dividing  by 
p,  expanding  into  partial  fractions  and  multi- 
plying through  by  p,  we  get 

On  V  +  «1        P  +  «» 

\p  +  Mi     P  +  fit 




where  the  A's,  B's,  ats  and  0"s  are  positive  real 
quantities.  The  first  term  must  be  associated 
with  those  in  the  first  parentheses  if  an+l  >  0, 
with  those  in  the  second  parentheses  if  an+,  <  0. 
The  transmission  function  is  now  in  the  form 

Y(P)=YAP)-YB(P)  (14) 

where  YA(p)  and  YB(p)  are  physically  real- 
izable driving-point  admittances  of  RC  type. 
Each  term  of  the  form  pA/  (p  +  a)  is  the  admit- 
tance of  the  two-terminal,  two-element  network 

a  ..a 

s — wwv — 1| — 0 

Figure  8.    Simple  RC  network. 

shown  in  Figure  8.  Each  term  in  (14)  there- 
fore represents  a  parallel  combination  of  two- 
element  networks  of  the  type  shown  in  Figure 
8  and  a  conductance  a0  in  the  case  of  YA(p), 



Figure  9.    Method  of  realizing  RC  transmission 
functions,  requiring  phase  inverter. 

and  a  capacitance  |Onn|/b„  in  the  case  of  either 
YAP)  or  YB(p).  By  well-known  methods  these 
two-terminal  networks  may  be  transformed 
into  a  variety  of  other  configurations. 

The  transmission  function  (14)  may  be  real- 
ized in  the  arrangement  shown  in  Figure  9 
or  in  that  shown  in  Figure  10.  The  latter  is 
a  lattice  network  which  is  suitable  only  in  a 


I  =  (YA-YB).E 

Figure  10.  Lattice  prototype  for  passive  net- 
works with  RC  transmission  characteristics. 

balanced-to-ground  circuit.  To  obtain  an  un- 
balanced passive  equivalent  of  this  network  we 
may  resort  to  steps  which  will  be  described 
later  in  this  section. 

The  second  general  method  of  designing  RC 
networks  is  most  useful  when 

Y(r>)  =  r>  a°  +  a'P  +  •  ■  +  q"P" 
KV)      P  1  +  blV  +  •••  +  6.p- 

(«o  >  0) 


with  simple,  real,  negative  poles.  Now,  if  the 
lattice  in  Figure  10  were  driven  from  an  in- 
finite-impedance source  of  current  /„,  the  out- 
put current  would  be 

1  - 

/  = 



1  t7~ 

If,  furthermore, 








Taking  it  for  granted  for  the  moment  that  the 
lattice  can  be  transformed  as  shown  schemat- 
ically in  Figure  11,  we  may  then  discard  the 
condenser  across  the  output  terminals  and,  by 
Thevenin's  theorem,1"  we  may  replace  the 
condenser  across  the  input  terminals  and  the 
infinite-impedance  current  source  by  a  series 
condenser  and  a  zero-impedance  voltage  source. 
The  result  is  shown  in  Figure  12.  Since 


desk;*  of  rc  networks 


V  F. 

I,  -  pC  E  we  now  have 

7  =  (  " 


which  ia  the  desired  result,  to  a  constant  factor. 

The  factor  k  should  in  general  be  taken  as 
small  as  possible  subject  to  the  requirement 
that  all  the  roots  and  poles  of  (16)  be  simple, 

Figure  11.    Step  in  transformation  of  networks 
with  zero  transmission  at  zero  frequency. 

real,  and  negative.  It  can  always  be  taken  large 
enough  to  fulfill  this  requirement.  A  suitable 
value  may  be  easily  chosen  by  inspection  of  a 
plot  of  Y (p)  fp  for  negative  real  values  of  p. 

Figure  12.  Final  step  in  transformation  of  net- 
works with  zero  transmission  at  zero  frequency. 

The  numerator  and  denominator  of  (16)  are 
of  equal  degree  and  therefore  contain  the  same 
number  of  linear  factors.  These  factors  may  be 
assigned  to  YA  or  to  YB  arbitrarily  except  that 
YA  and  YF  must  be  physically  realizable  driv- 
ing-point admittance  functions  which  behave 
ultimately  like  condensers  as  the  frequency  in- 
creases indefinitely;  that  is,  roots  and  poles 
must  alternate  and  there  must  be  a  simple  pole 
at  infinity. 

There  are  five  kinds  of  steps  which  may  be 
taken  to  transform  a  lattice  into  an  unbalanced 
form.  These  steps  are  based  upon  Bartlett's 
bisection  theorem,14  and  may  be  taken  in  any 
order  and  as  often  as  necessary.  Each  of  them 
will  now  be  described  as  it  would  be  applied 
directly  to  Figure  10.  In  the  following  diagrams 
a  lattice  enclosed  in  a  rectangle  means  an  un- 
balanced network  whose  configuration  may  not 
be  known  yet,  but  whose  lattice  prototype  is  as 

1.  Shunt  network  pulled  out  of  both  branches : 
shown  in  Figure  13. 

2.  Shunt  network  pulled  out  of  the  line  branch 
only:  shown  in  Figure  14. 

3.  Series  network  pulled  out  of  both  branches : 
shown  in  Figure  15.° 

4.  Series  network  pulled  out  of  the  lattice 
branch  only  :  shown  in  Figure  16.c 

Figure  lii.  Step  in  transiormauon  oi  lattice; 
shunt  networks  pulled  out  of  both  branches. 

Figure  14.  Step  in  transformation  of  lattice; 
shunt  network  pulled  out  of  line  branch  only. 

Figure  15.  Step  in  transformation  of  lattice; 
series  networks  pulled  out  of  both  branches. 




Figure  16.  Step  in  transformation  of  lattice; 
series  network  pulled  out  of  lattice  branch  only. 

*  Given  in  impedance  form. 




5.  Breakdown  into  parallel  lattices:  a  fairly 
obvious  step  which  need  not  be  illustrated. 
As  an  example  of  (13)  consider 

I(P)  l+blP 
where  all  the  coefficients  are  positive.  Since 

y(p)  =  P£}  -f-  a0  -  Oil.  ~  °lbl  +  ff»)p 

there  is  no  problem  if  a,  >  (a,/^)  +  a^^  But  if 
Ox  <  (aj/6,)  +  a06x  we  have  the  problem  of  trans- 

v — 5 — 

Figure  17.   Illustrative  lattice  prototype. 

forming  the  lattice  in  Figure  17.  We  can  apply 
steps  2  and  4  immediately,  but  find  that  the 
residual  lattice  cannot  be  transformed  unless 
a,  >  {ajb,).  Under  this  additional  restriction 
we  can  apply  step  8  obtaining  finally  the  net- 
work shown  in  Figure  18. 

As  an  example  of  (15)  consider 

Taking  k  =  1  (the  smallest  value  which  may  be 
assigned) ,  we  get 

Yb  m       2p(3  +  16p) 

(1  +  2p)  (1  + 

One  way  of  choosing  YA  and  YB  is 

Y       (1  +  2p)  (1  +  16p) 
A  2(3  +  16p) 

This  leads  finally  to  the  network  shown  in  Fig- 
ure 19.  Such  a  simple  network  is  possible  of 

YB  =  p. 

course  because  F(p)  happens  to  satisfy  the  re- 
quirements of  a  physically  realizable  driving- 
point  admittance  function.  However,  another 
way  of  choosing  YA  and  YB  is 


l_±_2p     Y       p(3  -I-  16p) 
2  *  "    1  +  16p 

This  leads  to  the  network  shown  in  Figure  20. 


Figure  18.  Unbalanced  equivalent  of  illustrative 
lattice  prototype  when  02/61  <oi<  (a2/6i)  +  006!. 



— wv\a — 1| — 

0  =44    r  =  — 
1   5     c«  9 

Figure  ltf.  KC'  network  with  zero  transmission  at 
aero  frequency. 

C0=l  Ro=2 

■AAAAAr  1 

R,=  3 

Figure  20.  Another  /2C  network  with  zero  trans- 
mission at  zero  frequency. 


Chapter  13 


rpHE  ILLUSTRATIVE  material  described  in  this 
J-  chapter  is  taken  from  four  practical  appli- 

1.  Second-derivative  circuit  for  the  M9  anti- 
aircraft director. 

2.  Position  data  smoother  for  the  "close  sup- 
port plotting  board,"  with  delay  correction  for 
constant  velocity  aircraft. 

3.  Position  and  rate  circuit  for  the  "com- 
puter for  controlling  bombers  from  the 
ground,"  with  optional  delay  correction  of  posi- 
tion data  for  constant-velocity  aircraft. 

4.  Position  and  rate  circuit  using  electro- 
mechanical servomeeha.'Msms. 

The  design  and  analytical  procedure  used  in 
the  first  application  has  not  heretofore  been 
described  in  writing.  Hence,  considerably  more 
space  will  be  devoted  to  it  than  to  the  other 
three  applications.  The  latter  have  been  de- 
scribed in  detail  in  reports.1" 1;  13 


,,  M    Realizable  Approximation  of  Best 
Transmission  Function 

The  best  transmission  function  for  the  sec- 
ond-derivative circuit  was  taken  to  be 

JVp)  =  p%(p)  , 

in  the  notation  of  Chapter  11.  This  assumes  fiat 
random  noise  in  position  data  and,  arbitrarily, 
1-second  smoothing  and  settling  time.  The 
series  expansion  of  y.,(p)  is,  according  to  ex- 
pressions (15)  of  Chapter  11, 

yf(p,-i  -Ip  +  ip..  JLp.  +  jl-p*...,. 

The  form  of  the  rational  approximation, 

yip)  = 

1  +  6,p  +  b2p*  +  b3p3  +  b<p4' 

was  chosen  for  simplicity  under  the  require- 
ment that  the  transmission  function  p*y(p) 

should  cut  off  at  the  rate  of  12  db  per  octave." 
This  requirement  was  set  as  a  precaution 
against  noise  due  to  granularity  of  the  coordi- 
nate-conversion potentiometers  in  the  director. 

Following  the  procedure  outlined  in  Section 
12.2  the  following  equations  were  obtained : 

!>i  —  2  =  0 


b<  -\bi  +  lbt  -±  b1  +  1^ 

1  h  -  3  h  1 
2'    J      28'    1  ~  53 



p*  +  21pJ  +  189p*  -(-  882p  +  1764 
21  +  V21 



-  ip»  + 

P  +  42) 

x  rp«  +  21  -y^p  +  42)  , 


yAv)  would  have  two  conjugate  pairs  of  com- 
plex poles,  viz., 

p  =  -  6.40  ±  il.047,     -  4.10  ±  t6.02, 

of  which  one  pair  is  very  nearly  real. 

In  order  to  simplify  the  circuit  design,  how- 
ever, it  was  desirable  to  limit  the  number  of 
complex  poles  to  a  single  conjugate  pair.  This 
was  accomplished  by  leaving  b4  arbitrary  so 
that  the  denominator  of  y2(p)  was 

1  +  5p  +  kp,+  8lp,  +  bipt  • 
A  value  for  bt  which  would  make  this  expres- 
sion vanish  at  two  negative  real  values  of  p 
was  found  by  plotting 

176464  -  5  (*»  -  Ox*  +  42x  -  84) 

'  The  design  antedated  the  formulation  of  the  n  —  m 
=  r  +  1  rule  given  in  Section  12.2,  according  to  which 
the  best  transmission  function  should  have  been  taken 
as  p'y,(p)  in  the  notation  of  Chapter  11.  However,  no 
trouble  waa  experienced  in  obtaining  a  physically  real- 
izable approximation,  of  the  complexity  assumed. 





against  x,  as  shown  in  Figure  1.  The  right- 
hand  member  is  positive  only  in  the  range 
x  >  3.77  and  has  a  maximum  of  0.982  at  about 
z  =  6.63. 




1764  b4 



1.0  2.0  4.0      6.0    6.0  IO0 

Figure  1.    Graphical  determination  of  6«. 

In  order  to  obtain  a  substantial  separation 
between  the  two  real  poles  of  y2(p),  the  value 
17646,  =  0.5  was  chosen.  The  approximation 




has  poles  at 

p  -  -  4.17391  ,    -  31.72813  ,    -  3.04898 
*  t 4.16463  . 
The  series  expansion  of  y.,  (p)  agrees  with  that 
of  Vt(p)  to  four  terms,  the  fifth  term  being 
37/7056  p*  instead  of  5/1008  p\  The  difference 
in  the  fifth  term  is  less  than  6  per  cent. 

The  realized  approximation  and  the  best 
weighting  function  are  shown  in  Figure  3. 

is.u  Transient  Responses 

The  responses  of  the  physical  network  whose 
transmission  function  is  p2y2(p)  are  compared 
to  those  of  the  best  network  whose  transmis- 
sion function  is  p2y2(p),  in  Figures  2,  3,  and  4. 
The  signals  for  which  (and  the  formulas  by 
which)  these  responses  were  computed  are 
tabulated  below. 

Response  formulas 
Realized  Best 
L~Hm(p)\      00/(1  -20(1  -/) 

L~l\Vdv)\  mu\-t)\* 



/  <0      I £0 


0  1 


0  t 


o  >f 


/'(10-  15/  +  6/1) 

It  has  been  noted  that  Figure  3  also  repre- 
sents the  best  and  the  realized  weighting  func- 










1  » 
\  t 

\  « 





\  1 
\  \ 




V  1M  M  V  HB  IM  Mm  1 

Figure  2.  Responses  to  step  function,  viz.,  E (t)  = 
1  when  t  >  0. 






.  ICST 








Figure  3.  Responses  to  linear  ramp  function,  vfz., 
E(t)  -  t  when  t  >  0;  second  derivative  smoothing 


Figure  4.  Responses  to  parabolic  ramp  function, 
viz.,  E(t)  =  (%)£  when  t  >  0;  second  derivative 
settling  characteristics. 




If  a  signal  of  the  form 

Eif)  =  at  +  a  J  +  -.,  (hfi 

were  to  be  applied  suddenly  to  the  second -de- 
rivative circuit  at  t  =  0  the  response  would  be 

r'-; !  (;)-•;•<  (?)+*.•<■(?) 

where  A,„  A,,  A  .  stand  for  the  responses  shown 
in  Figures  2,  3,  and  4,  respectively,  and  where  t 
is  the  time  in  seconds  and  T  is  the  nominal 
smoothing  time.  The  response  V(t)  is  the  indi- 
cated acceleration  of  the  target. 

The  sudden  application  of  the  instantaneous 
position  and  velocity  components  of  the  signal 
to  the  second-derivative  circuit  will  give  rise  to 
some  very  serious  consequences  unless  special 
measures  are  taken  to  mitigate  them.  To  see 
this  let  it  be  assumed  that  T  =  20  seconds  and 
that  the  target  is  at  such  a  range  that  a„  = 
20,000  yards  when  the  signal  E  (t)  is  applied 
to  the  second-derivative  circuit.  Each  unit  of 
A0  in  the  ordinate  scale  of  Figure  2  then  repre- 
sents an  indicated  acceleration  of  50  yd  per 
sec-.  Referring  to  Figure  2  it  is  clear  not  only 
that  the  effective  settling  time  will  be  several 
times  the  smoothing  time  but  also  that  the  indi- 
cated acceleration  will  go  through  exceedingly 
large  maxima. 

Exceedingly  large  transient  responses  are 
not  peculiar  to  second-derivative  circuits.  They 
occur  also  in  first-derivative  circuits  in  linear 
prediction,  where  they  are  due  entirely  to  the 
initial  position  term  in  the  signal.  In  all  cases 
they  are  reduced  to  harmless  proportions  by 
special  arrangements  of  the  circuits  during  the 
operation  of  slewing. 

tion  Ys  of  the  experimental  second-derivative 
circuit  design,  also  referred  to  a  nominal 
smoothing  time  of  1  second.  The  transmission 
function  of  the  linear  prediction  circuit  with 
10-second  smoothing  of  first  derivative  is  then 

:—  JTTT 

Table  1* 


 .  - 


















— 2.014 

3  527 



































— 2.092 





— 4.320 





— 5.777 






















































































•  f  is  in 

c  when  smoothing  time  T  =  1 

sec.  For 

T-second  net- 

works.  values  of  9/  are  multiples  of  1/9T  e,  values  of  Yt  should 
bo  divided  by  T,  and  values  of  Yt  should  be  divided  by  T».  The 

lwo  networks  may  have  different  values  of  7*. 


Effect  of  Tracking  Errors  on  while  that  of  the  quadratic  prediction  circuit 

Accuracy  of  Prediction  with  20-second  smoothing  of  second  derivative 

The  statistical  effect  of  tracking  errors  on  1S 

the  accuracy  of  prediction  is  most  readily  de- 
termined from  the  power  spectrum  of  the 
tracking  errors  and  the  transmission  function 
of  the  prediction  circuit. 

Table  1  gives  the  values  of  the  transmission 
function  F,  of  the  first-derivative  circuit  in  the 
M9  director,  referred  to  a  nominal  smoothing 
time  of  1  second,'1  and  the  transmission  func- 



9494_        K.077  31  74 

1.6      V  +  2.4      /.  -r  :Ui 

27  01  \ 

v  +  ah) 

Y,(P)  -  JVp)  + 


i  G2  are  determined  in  accordance 
with  the  discussion  in  Section  A.10.  Since 

we  get 

)',(p)  =  p(l  -  0.3724p  + 
)-,<p)  =  p2(l  -•••) 


0',  =  // 

ft  -  I  </  +  3.7241,  . 





Table  2  gives  the  values  of  \Yi(p)  |J  and  of 
\Yq(p)  \*  for  tt  =  5, 10, 15, 20  seconds.  These  are 
plotted  in  Figures  5,  6,  7,  and  8. 

of  the  total  power,  or  an  rms  error  of  15.8 
yards  out  of  17.9  yards. 

The  rms  error  of  prediction  is  the  square 
root  of  the  power  transmitted  by  the  prediction 
circuit.  This  is  tabulated  on  the  last  line  of 
Table  2  and  in  the  smaller  table  following. 

Figure  5.  Power  transmission  ratio  of  linear 
and  quadratic  prediction  circuits  with  5-second 
prediction  time. 

The  last  column  of  Table  2  and  Figure  9 
give  the  power  spectrum  of  a  composite  of  the 
range  and  transverse  errors  in  a  typical  run 

The  power  contained  in  the  frequency  range 
covered  by  the  table  accounts  for  78  per  cent 


rawt*  THANsyiuiON  «atio 




-quad  nta 






i  u 

»        II  20 

Figure  6.  Power  transmission  ratio  of  linear  and 
quadratic  prediction  circuits  with  10-second  pre- 
diction time. 

Table  2 





!  Y,\*  I 











































































































error  of 




33  9 






P*  Mk-vn 











































































































55.4  125.0 

•  P  U  in  uniu  of  180  yd"  per  c 




Time  of  flight 
in  seconds 

Rms  error  of  prediction  due 
to  tracking  errors  in  yards 
Linear  Quadratic 



It  is  obviously  relatively  disadvantageous  to 
use  quadratic  prediction  when  the  target  is  in 
fact  flying  a  rectilinear  unaccelerated  course. 

Figure  7.  Power  transmission  ratio  of  linear 
and  quadratic  prediction  circuits  with  15-second 
prediction  time. 











1  1  i 



J — I 

i  r 


1  2o 

Figure  8.  Power  transmission  ratio  of  linear  and 
quadratic  prediction  circuits  with  20-second  pre- 
diction time. 

The  relative  advantage  of  linear  prediction 
should  persist  for  target  paths  with  only  a 
slight  amount  of  curvature,  but  this  relative 
advantage  should  decrease  as  the  curvature  is 
increased.  When  the  curvature  exceeds  a  cer- 
tain amount,  the  relative  advantage  should 
shift  to  quadratic  prediction. 
The  determination  of  the  minimum  value  of 

target  path  curvature  at  which  quadratic  pre- 
diction becomes  relatively  advantageous  de- 
pends not  only  upon: 

1.  dispersion  of  the  predicted  point  of  im- 
pact due  to  tracking  errors, 
but  also  upon  a  number  of  i 
which  are : 

2.  actual  future  position  of  target  with 
respect  to  the  predicted  point  of  impact,  assum- 
ing an  accurate  computer  and  the  absence  of  all 
sources  of  dispersion  enumerated  here  ;e 

3.  dispersion  due  to  inaccuracies  in  the  com- 
puter and  data-transmission  systems ; 

4.  dispersion  due  to  noise  in  the  computer 
and  data-transmission  systems ; 

5.  dispersion  due  to  variations  in  actual  dead 

6.  dispersion  due  to  gun  wear  and  to  varia- 
tions in  powder  charge,  shell  weight,  shell 
shape,  etc.; 







s  i 


'  i 


1  1  r 


"  1 

 1      it  1 

*    " — fi — =ft — it 

Figure  9.   Composite  power  spectrum  of  tracking* 
errors  of  experimental  radar. 

7.  dispersion  due  to  variations  in  meteoro- 
logical conditions  along  the  path  of  the  shell ; 

8.  dispersion  due  to  variability  of  time-fuze 
calibration ;  and 

9.  lethal  pattern  of  shell  burst. 

In  a  special  illustrative  case,  a  numerical 
analysis,  including  most  of  these  factors  (esti- 
mated), showed  that  quadratic  prediction  be- 
comes relatively  advantageous  when  the  target 
acceleration  exceeds  about  O.lg.  However,  this 
should  not  be  taken  as  a  general  result. 

o  This  is  considered  in  detail  in  the  next  section. 




1,1  *    Linear  and  Quadratic  Prediction 
Errors  on  Constant-Velocity 
Circular  Courses 

The  use  of  a  finite  number  of  derivatives  of 
the  tracking  data  for  purposes  of  prediction  is 
itself  a  source  of  prediction  errors  even  if  there 
were  no  tracking  errors.  Definite  evaluation  of 
these  prediction  errors  can  be  made  only  if  the 
path  of  the  target  is  prescribed.  The  simplest 
path  which  can  be  prescribed  for  this  purpose 
is  a  circular  one  at  constant  velocity.  Such  a 
path  is  fairly  realistic  when  considered  in  rela- 
tion to  the  difficulty  of  maneuvering  a  bomber 
and  to  actual  records  of  the  paths  of  hostile 
bombers  over  London  during  World  War  II. 

The  position  of  a  target  flying  in  a  circle  at 
constant  velocity,  referred  to  the  center  of  the 
circle,  is  expressed  by  the  complex  quantity 
Re**  where  R  is  the  radius  of  the  circle  and  « 
is  the  angular  rate.  In  terms  of  the  velocity  V 
and  the  transverse  acceleration  A,  we  have 
R  =  V*/A  w  =  A/V.  The  predicted  position  is 
then  at  JtT(i»)e'-'  where  Y(u.)  is  the  trans- 
mission function  of  the  prediction  circuit.  The 
true  future  position  of  the  target,  however,  is 
at  R  exp  [i«>(t  +  t,)  ].  Hence,  the  prediction 
error,  referred  to  axes  fixed  on  the  target  and 
oriented  respectively  transverse  to  and  in  the 
direction  of  the  present  velocity,  is 

«  ~  RlY(iu)  -  e"r] . 
As  an  illustration  let  us  consider  a  case  in 
which  V  =  150  yd  per  sec,  A  =  5  yd  per  sec1  and 
tf  =  10.  For  the  linear  prediction  circuit 

Yrffo)  -  1.0409  +  /0.3296 

and  for  the  quadratic  prediction  circuit 

r,(»«)  -  0.9501  +  t0.3610 


-  0.9450  +  t0.3272  . 

Hence,  when  the  present  position  of  the  target 
is  at  4500  +  t'O  with  respect  to  the  center  of  the 
circle,  the  linear  predicted  point  is  at  4684  + 
tl483,  the  quadratic  predicted  point  is  at 
4276  -I-  t'1624  while  the  true  future  position  is 
at  4252  +  t'1472.  These  are  shown  in  Figure  10. 
The  prediction  error  vectors  are 

«,  =  432  +  /ll  jt|;  =  432 
«t  =    24  +  f  152     |«v  =  154 

Referring  to  Figure  10  it  may  be  observed 
that  if  the  first-derivative  component  of  the 
prediction  were  to  be  reduced  by  approximately 
10  per  cent  a  nearly  perfect  hit  would  be  ob- 
tained. This  suggests  the  possibility  of  deter- 

2000  - 





(10  SEC)  ^ 


— tv  LINEAR 










woo  - 


1  FIRST  Kl 

1 — 





4M0  m  TO 
9     CCMTC*  Or  TURK 


Figure  10.  Vector  diagram  of  linear  and  quadratic 
prediction  for  constant-velocity  circular  courses. 

mining  empirical  functions  of  the  time  of  flight 
for  the  potentiometer  factors  G,  and  G,  in 
order  to  improve  the  probability  of  kill.  This 
would  involve  consideration  of  all  of  the 
sources  of  dispersion  enumerated  in  the  preced- 
ing section  as  well  as  a  statistical  study  of  tar- 
get paths.  Such  a  determination  has  not  been 

it. i s      Physical  Configuration  of  the 
Second-Derivative  Circuit 

In  this  section  we  shall  derive  a  physical  con- 
figuration for  the  second-derivative  circuit.  In 
particular  it  illustrates  the  application  of  feed- 
back to  the  realization  of  weighting  functions 
or  impulsive  admittances  involving  complex 
exponentials  in  general."  It  should  be  pointed 
out,  however,  that  the  application  of  feedback 
to  the  end  in  view  is  not  restricted  to  purely 

0  Originally  proposed  by  R.  L.  Dietzold. 



electronic  circuits.  An  application  involving 
the  use  of  servomechanisms  will  be  described 
in  Section  13.4. 

The  transmission  function  which  concerns  us 
here  may  be  expressed  in  the  partially  factored 

Y(P)  = 

((>  +  0.2087)  i/>  +  l..)S04)(/;-  +  0.3U4<»p  +  O.OttOli) 
where  the  |>oles  have  been  adjusted  to  cor- 
respond to  T  =  20  seconds  and  where  a  constant 
factor  has  been  left  out. 

The  circuit  is  to  be  designed  to  work  out  of 
the  amplifier  in  the  first-derivative  circuit  of 
the  M9  director.  Since  this  much  of  the  first- 
derivative  circuit  has  a  transmission  function 
of  the  form  p  (p-t-0.24),  the  transmission 
function  which  we  have  to  realize  is  Y  ,(p) / 
Y,(l>)  where 


P  f  0.20S7'  ip  +  i..W»4i 


U.MWp  +  IMKttWi 
p  +  0.24 

The  inversion  of  the  factor  corresponding  to 
Y,(p)  is  in  accordance  with  the  fact  that  the 
transmission  gain  through  a  feedback  amplifier 
is  equal  to  the  loss  in  the  feedback  network, 
provided  the  feedback  is  very  large.  To  realize 
the  transmission  function  Y,(p)  /Y,(p)  it  is 
therefore  necessary  only  to  realize  the  trans- 



1 — 1| — WVW^WV- 

»,C,=  J.IM 

Ci  =, 

R,C, =  J. 604 
R,=  0.07UI  R, 

=  iz.n 



Figure  11.  Physical  configuration  of  quadratic 
prediction  circuit  for  modified  M9  AA  director. 

mission  functions  Y{(p)  and  Y,(p)  individu- 
ally. The  corresponding  networks  are  shown  in 
Figure  11,  with  typical  element  values. 

The  input  network  has  four  elements, 
whereas  Y,  (p)  has  only  two  parameters.  Hence 
there  are  two  degrees  of  freedom  in  the  element 
values  of  this  network.  One  degree  of  freedom 
must  be  reserved  for  the  impedance  level;  the 
other  permits  some  latitude  in  the  relative 
values  of  the  resistances  and  stiffnesses. 

The  feedback  network  has  four  independent 
elements,  whereas  Y,(p)  has  three  parameters. 
Hence  there  is  only  one  degree  of  freedom  in 
the  element  values  of  this  network.  This  degree 
of  freedom  must  be  reserved  for  the  impedance 

There  is,  however,  one  degree  of  freedom  be- 
tween the  impedance  levels  of  the  two  net- 
works. This  follows  from  the  fact  that  the 
transmission  function  of  the  circuit  is  the  ratio 
of  the  transmission  functions  of  the  individual 
networks.  The  scale  factor  for  the  transmission 
function  of  the  circuit  is  readily  determined 
from  the  fact  that  the  transmission  function 
must  be  approximately  pRt,C„  at  small  values 
of  p. 



In  this  application,  position  data  smoothing 
with  delay  correction  for  constant  rates  of 
change  in  position  was  required.  Assuming  flat 
random  noise  in  position  data,  and,  arbitrarily, 
1-second  smoothing  time,  the  best  transmission 
function  for  position  data  smoothing  without 
delay  correction  is  yu(v)  in  the  notation  of 
Section  11.3.  The  best  transmission  function 
for  the  first-derivative  circuit,  if  it  were  re- 
quired, is  pyx  (p) .  Hence,  the  best  transmission 
function  for  position  data  smoothing  with  full 
delay  correction  is 

=  »o(p)  +  g  P*l(p)  • 
This  corresponds  to  the  weighting  function 

Wi(t)  =  14,(0 

=  2(2-3/)    0  <  /  <  1  . 

The  series  expansion  for  Y,(p)  is,  by  (15) 
of  Chapter  11, 



PJ  +  £  _  JL-  + 

12  T  30      120  T 




The  form  of  the  rational  approximation  was 
chosen  as 

'  W      1 .+  blP  +  62pl  +  b,p* 

in  order  to  obtain  a  loss  characteristic  which 
has  an  ultimate  slope  of  12  db  per  octave.*  This 
requirement  was  also  set  as  a  precaution 
against  noise  due  to  granularity  of  the  coordi- 
nate-conversion potentiometers.  The  coefficients 
are  determined  by 



fci  =  ai 

-n>  =  ° 

+  ™ 



-V2b>  +  3ofel  -  lib  =  ° 


Y(p)  = 

1  +  Hf  +  If'  + 


This  may  be  expressed  in  the  form  Y(p) 
YAp)/Y,(p)  where 


7<(p)  =  1  -(-  0.1053p 
„  ,  ,       1  +  0.3530p  +  0.0461 5p' 

w)  -  — 

The  circuit 
Figure  12. 

1  +  0.4583p 
ion  is  shown  below  in 

R./2  "•/* 


R,C,  =0.1007 

R,  =  0J06IR, 

Figure  12.  Physical  configuration  of  data-smooth- 
ing circuit  for  close  support  plotting  board. 

•  This  design  also  antedated  the  formulation  of  the 
n  —  m  =  r  +  1  rule  given  in  Section  12.2  according  to 
which  we  should  have  taken  Yi(p)  «  y,(p)  +  %  pyAp)  ■ 


In  this  application,  rate  smoothing  as  well  as 
position  smoothing  was  required.  In  addition, 
delay  correction  in  position,  for  constant  rate 
of  change,  was  to  be  available  but  optional,  and 
the  loss  characteristic  was  to  have  an  ultimate 
slope  of  12  db  per  octave,  or  more. 

In  accordance  with  the  n  —  m  =  r  +  1  rule, 
the  best  transmission  function  for  position  data 
is  y1  (p) ,  whereas  that  for  rate  is  pi/:  (p) .  A  num- 
ber of  designs  were  made  on  this  basis.  How- 
ever, from  the  point  of  view  of  network  econ- 
omy they  were  inferior  to  a  design  based  on 
j/2(p)  for  position  data.  The  use  of  2/2(p)  for 
position  data  is  not  consistent,  theoretically, 
with  the  use  of  pi/2(p)  for  rate,  but  the  practi- 
cal advantage  outweighs  the  theoretical  disad- 

The  rational  approximation  used  for  i/,(p) 


MR,  0JR, 
l— WW-r^VWV— 1 



r  *. 


R,C,  =  0.4431 

r,c,  «ai*M 

R,C,  -0.S000 
R,C.  *  HUM 
R,Ct  «  0.13*0 


0.2 i5J       (FOR  DELAY  CORRECTION) 

Figure  13.  Physical  configuration  of  linear  pre- 
diction circuit  for  ground-control  bombing  com- 

is  the  one  given  in  (6),  Section  12.2.  It  may 
be  expressed  as 





1  +  0.2153p 

1  +  0.2847p  +  0.03870p» 
1  +  0.135<Jp 


1  +  0.135*)p 





It  may  be  noted  that  a  redundant  factor  has 
been  introduced,  viz.,  1  +  0.1359p,  in  order  to 
secure  a  physically  realizable  Y,(v) .  The  coeffi- 
cient was  chosen  so  that  a  resistance  would  not 
be  required  in  the  shunt  branch  of  the  feedback 
network.  Referring  to  tin-  circuit  configura- 
tion in  Figure  13,  the  transmission  function  of 
the  input  network  is  Y,s(p),  that  of  the  feed- 
back network  is  Y,(p),  and  that  of  the  output 
network  at  the  top  is  Y,  ,(p) . 

The  output  impedance  of  the  amplifier  is  re- 
duced nearly  to  zero  by  virtue  of  shunt  feed- 
back.1"^ Hence,  the  rate  circuit,  as  shown  in 
Figure  13,  may  be  derived  from  the  amplifier 
output  through  a  simple  additional  network 
whose  transmission  function  is  pY,,(p)-  Two 
rate  outputs  are  provided  so  that  the  delay 
introduced  in  position  may  be  corrected  option- 
ally without  disturbing  scale  factors. 

In  the  final  report,  October  25,  1945,  to 
NDRC  Division  7,  on  the  research  program  car- 
ried on  under  Contract  NDCrc-178,  a  list  is 
given  of  a  number  of  the  more  important  prac- 
tical advantages  for  the  use  of  a-c  carrier  in 
computing  circuits.  These  advantages  are: 

1.  Permits  operation  at  lower  levels  before 
running  into  trouble  with  thermal  noise,  contact 
potentials,  drifts  due  to  temperature; 

2.  Permits  use  of  transformers  for  imped- 
ance matching,  voltage  transformations,  cou- 
pling between  balanced  and  unbalanced  circuits ; 

3.  Permits  use  of  hybrid  coils  for  voltage 
summations  of  moderate  precision ; 

4.  Eliminates  the  necessity  for  modulators  in 
servo  circuits  using  a-c  motors ; 

5.  Permits  reduction  in  total  power  consump- 
tion, rectified  power  for  amplifiers,  and  voltage 

However,  the  techniques  of  differentiation 
and  of  data  smoothing  with  fixed  networks  in 
computing  circuits  which  use  d-c  carrier,  are 
not  applicable  to  computing  circuits  which  use 
a-c  carrier. 

The  circuit  described  here  is  an  example  of 
one  of  the  techniques  used  in  the  T15-E1  experi- 
mental curved  flight  director.'  In  Figure  14 
servo  motors'  are  indicated  by  A/,  and  genera- 

'  The  technique  of  using  servo  motor*  for  smoothing, 
as  described  above,  is  due  chiefly  to  h  L.  Norton. 

tors  by  G.  The  motors  are  two-phase  induction 
motors  with  one  phase  winding  of  each  ener- 
gized directly  by  the  carrier  source  at  constant 
amplitude.  The  generators  are  essentially  two- 
phase  induction  motors  also  with  one  phase 
winding  of  each  energized  directly  by  the  carrier 
source  at  constant  amplitude.  They  deliver,  at 


14.   Electromechanical  linear  prediction 

the  other  phase  windings,  carrier  voltage  at 
amplitudes  proportional  to  the  angular  velocities 
0,  and  0,  of  the  shafts.  The  potentiometers  are 
energized  by  the  carrier  source  at  constant  am- 
plitude. They  deliver  carrier  voltage  at  ampli- 
tudes proportional  to  the  angular  positions  0, 
and  6.2  of  the  shafts  from  some  reference  posi- 
tions. The  position  data  are  represented  by  the 
modulation  amplitude  E. 

With  amplifiers  of  sufficiently  large  voltage 
gain  and  power  capacity,  and  motors  of  suffi- 
ciently large  torque,  the  operational  equations 
of  the  circuit  are  readily  found  by  equating  to 
zero  the  sum  of  the  voltages  applied  to  each 
amplifier.  Thus 

0i  +  (a,  +  0p)0,  =  E 
p0i  -  (1  +  a2p)0,  =  0 


0i  = 

u2  = 

1  +  atp 

l  +       +  a„)p  -(-  0pJ 


1  -Mat  +  «s)p  +  /3pJ 

The  angular  position  0l  therefore  represents 
the  smoothed  position  data  while  the  angular 
position  62  represents  the  smoothed  rate. 


Chapter  14 


The  past  discussion  has  been  more  or  less 
clearly  directed  at  predictor  systems  hav- 
ing certain  well-defined  properties.  For  ex- 
ample, it  has  been  tacitly  assumed  that  the  first 
part  of  the  prediction  system  will  consist  of 
geometrical  manipulations  transforming  the 
raw  input  data  into  other  quantities,  such  as 
the  components  of  velocity  in  Cartesian  or  in- 
trinsic coordinates,  which  we  have  some  physi- 
cal reason  to  believe  should  be  approximately 
constant  for  extended  periods."  These  quanti- 
ties, then,  are  isolated  explicitly  in  the  circuit 
and  are  the  actual  effective  inputs  of  the  data- 
smoothing  networks.  The  data-smoothing  net- 
works themselves  are,  of  course,  definitely 
assumed  to  be  linear  and  invariable. 

This  is  obviously  a  straightforward  attack 
but  it  does  not  necessarily  exhaust  all  possibili- 
ties. For  example,  advantages  may  be  gained 
by  using  data-smoothing  networks  which  are 
nonlinear  or  which  vary  with  time  or  target 
position.  It  may  also  be  possible  to  smooth  the 
input  data  according  to  some  geometric  as- 
sumption, such  as  straight  line  flight,  without 
the  necessity  of  isolating  geometrical  parame- 
ters explicitly. 

This  chapter  attempts  to  illustrate  these  pos- 
sibilities by  some  rather  scattered  examples. 
Data-smoothing  networks  which  vary  with  time 
seem  to  give  improved  performance  over  fixed 
networks,  and  have  been  studied  with  some 
care.  Several  examples  are  given  at  the  end  of 
the  chapter.  None  of  the  other  lines,  however, 
has  been  explored  at  all  thoroughly.  The  ex- 
amples of  data-smoothing  networks  variable 
with  time  are,  in  a  sense,  illustrations  of  non- 
linearity  also,  since  they  all  operate  on  the 
assumption  that  the  cycle  of  the  network's 
variation  with  time  begins  anew  at  each 
marked  change  in  course.  Since  a  change  in 
course  is  exactly  like  a  tracking  error,  except 
that  it  is  much  larger,  this  resetting  requires 
a  nonlinear  control  circuit  which  respond 
to  large  amplitude  effects  but  not  to"small  ones. 

1  This  is  true  ideally  even  in  the  Wiener  system  since 
Wiener  assumes  that  transformations  will  be  made  to 
some  suitable  coordinate  system,  preferably  the  intrin- 
sic, before  the  statistical  prediction  method  is  applied. 

This,  however,  is  evidently  a  very  mild  sort  of 
nonlinearity.  More  thoroughgoing  nonlineari- 
ties  have  not  been  studied.  There  seems  to  be 
no  a  priori  reason  for  supposing  that  they 
would  appreciably  improve  the  performance 
of  data-smoothing  networks. 

The  first  part  of  the  chapter  gives  examples 
of  data-smoothing  schemes  which  do  not  re- 
quire the  isolation  of  geometrical  parameters. 
They  are  based  on  degenerative  feedback  cir- 
cuits which  satisfy  the  requisite  formal  rela- 
tions but  which  might,  in  some  cases,  be  un- 
stable in  practice.  This  portion  of  the  material 
is  included  primarily  for  its  possible  sugges- 
tive value  rather  than  for  its  concrete  practical 


The  diversity  of  particular  circuits  can  be 
givon  a  certain  unity  by  regarding  them  all  as 
modifications  of  the  feedback  smoothing  cir- 
cuit shown  originally  in  Figure  2  of  Chapter 
10.  In  accordance  with  the  discussion  of  that 
figure  it  will  be  convenient  to  suppose  that  the 
resistive  feedback  path  is  introduced  to  limit 
the  gain  of  the  amplifier  proper,  so  that  the 
structure  reduces  to  an  amplifier  with  high  but 
finite  gain  and  a  pure  capacity  feedback.  The 
circuit  has  a  net  loop  gain,  and  is  consequently 
degenerative,  at  any  moderately  high  frequency. 
For  our  present  purposes,  it  is  convenient  to 
recall  the  general  property  of  degenerative 
feedback  amplifiers,  that  they  tend  to  suppress 
any  given  frequency  by  the  amount  of  the  de- 
generative feedback  for  that  frequency.  This 
suppression  obtains  not  only  at  the  amplifier 
output  but  at  many  other  points  in  the  circuit 
as  well.  For  example,  it  holds  at  the  amplifier 
input  if  we  combine  the  original  applied  volt- 
age with  the  voltage  contributed  by  the  feed- 
back1- circuit1**  Thus,  except  for  the  absolute 

b  This  follows  immediately  from  the  fact  that,  since 
the  characteristics  of  the  amplifier  proper  are  not 
changed  by  the  addition  of  the  feedback  path,  the 
output  voltage  is  always  a  fixed  multiple  of  the  net 
input  voltage. 





signal  level,  it  is  not  necessary  to  transmit 
through  the  amplifier  of  Figure  2  of  Chapter 
10  in  order  to  produce  the  smoothing  effect.  It 
would  be  sufficient  to  hang  the  input  circuit  of 
the  amplifier,  as  a  two-terminal  impedance, 
across  the  circuit. 


The  property  of  degenerative  feedback  cir- 
cuits which  has  just  been  described  is  con- 
veniently illustrated  by  a  three-dimensional  ex- 
tension of  the  original  smoothing  circuit  of 
Figure  2  cf  Chapter  10.  The  three-dimensional 
circuit  is  shown  in  Figure  1.  The  three  input 
voltages  are  the  quantities  D,  DE,  and  DA  cos 

i  'WW  I 

20k  win 


r  W\rt 







1  m  ' 


f  m  •  •  mm  m  mm^ 




Figure  1.    Feedback  smoothing  in  three  coordinates 

E,  where  D,  E,  and  A  are,  respectively,  slant 
range,  elevation,  and  azimuth.  The  three  volt- 
ages will  be  recognized  as  the  three  components 
of  the  target  motion  in  a  tilted  and  rotating 
rectangular  coordinate  system.  One  axis  of  the 
tilted  system  is  directed  along  the  instan- 

taneous line  of  sight  to  the  target  and  the  other 
two  are  perpendicular  to  this  one  in  the  ver- 
tical and  horizontal  planes  respectively.0  It  is 
assumed  that  these  input  rates  represent  target 
motion  in  a  straight  line,  plus  the  usual  track- 
ing errors.  The  object  of  the  smoothing  system 
is  to  provide  shunt  impedances  which  will  tend 
to  suppress  the  tracking  errors  by  feedback 
action,  according  to  the  principles  described  in 
the  preceding  section,  without  disturbing  the 
portions  of  the  input  voltages  corresponding  to 
the  assumed  straight  line  path. 

We  can  simplify  the  analysis  by  restricting 
our  attention  to  the  special  case  of  two-dimen- 
sional motion  which  occurs  when  the  target 
course  lies  in  a  vertical  plane  passing  directly 
through  the  antiaircraft  position.  This  is  illus- 
trated in  Figure  2.  In  this  case  the  component 
DA  cos  E  is  evidently  zero.  If  we  represent 
the  voltage  at  the  other  two  terminals,  includ- 
ing both  the  original  applied  voltages  and  the 
voltages  fed  back  through  the  circuit,  by  V,  and 
Vv  the  voltages  coming  out  of  the  coordinate 
converter  on  the  right-hand  side  in  Figure  2 

v,  «  Vi  cos  E  -Vt  sin  E 
vw  -  Vt  cos  E  +  Vx  sin  E 


These  voltages  are  differentiated,  passed 
through  a  second  coordinate  converter,  and  fed 
back  so  that  the  output  voltages  must  satisfy 


Vi  =  D  —       cos  E  +  it  sin  E) 
V,  =  DE  -       cos  E  -  v,  sin  E)  . 

In  order  to  exhibit  the  smoothing  action  of 
the  circuit  let  us  denote  the  observed  velocity 
components,  referred  to  the  upright  and  fixed 

0  This  is  the  coordinate  system  which  was  used  in  the 
experimental  T15  director.  A  complete  prediction  cir- 
cuit can  be  obtained  by  using-  the  three  voltages  de- 
scribed here  as  inputs  to  the  lead  servos  in  the  TIB 
system.  In  the  actual  T16  system,  rates  in  the  tilted 
and  rotating  coordinate  system  were  obtained  by  the 
so-called  "memory  point"  method.  The  voltages  D,  DE, 
-etc.,  required  with  the  present  method,  might  be  ob- 
tained with  the  help  of  tachometers  attached  to  the 
tracking  shafts  to  measure  the  instantaneous  values  of 
D,  E,  and  A.  An  equivalent  to  the  variable  smoothing 
of  the  memory  point  method  can  be  obtained  by  *«»n«f 
the  gains  in  the  feedback  paths  in  Figure  1  variable 
according  to  the  principles  described  in  a  later 




rectangular  coordinate  system,  by  ut  and  uw, 
so  that 

ut  =  D  cos  E  -  DE  sin  E 

u„  =  DE  cos  E  +  D  sin  E  . 
Substituting  (2)  and  (3)  into  (1),  we  get 



Uy     —  fiVy 


Ml'*  +  = 

HVy    +   Vy    =    Uy  . 

These  show  clearly  that  vx  and  v„  are  smoothed 
values  of  u„  and  uy,  respectively.  If  n  is  constant 
the  smoothing  is  of  fixed  exponential  type.  If  ^ 
is  proportional  to  the  time  up  to  some  maxi- 
mum value,  the  smoothing  is  of  the  variable 
type  described  in  Sections  14.6  and  14.7. 

To  complete  the  discussion  of  the  circuit  we 
observe  that  by  (1) 

Vi  —  rx  cos  E  +  vy  sin  E 
Vt  =  Vy  cos  E  —  r«  sin  E  . 

These  show  that  Vx  and  V,  are  the  smoothed 
rate  components  referred  to  the  tilted  and 
rotating  rectangular  coordinate  system.  The 
fact  that  the  orientation  of  this  coordinate  sys- 
tem, which  depends  upon  the  observed  angular 
height  E,  is  not  smoothed  makes  no  difference 
to  the  computation  of  the  leads  because  this 
computation  is  made  instantaneously  in  the 
same  coordinate  system  to  which  the  smoothed 
rate  components  are  instantaneously  referred. 

The  analysis  in  the  general  case  including 
all  three  coordinates  is  of  the  same  nature. 
Since  the  rate  components  in  fixed  rectangular 
coordinates  appear  in  the  middle  of  the  feed- 
back path,  it  is  perhaps  not  fair  to  regard  the 
circuit  as  an  illustration  of  a  data-smoothing 
device  which  does  not  rely  upon  the  explicit 
isolation  of  the  geometrical  parameters  of  the 
assumed  target  path.  It  should  be  pointed  out, 
however,  that  in  comparison  with  a  straight- 
forward geometrical  solution  in  which  velocity 
components  in  fixed  coordinates  are  first  isolated 
explicity,  then  smoothed,  and  then  used  to  form 
the  basis  of  prediction,  the  circuit  in  Figure  1 
has  the  advantage  that  most  of  the  components 
can  be  built  with  very  low  precision.  What  is 
transmitted  around  the  feedback  loop  is  essen- 

tially the  tracking  errors  only.  Since  tracking 
errors  are  always  small,  very  high  percentage 
errors  in  the  system  can  be  tolerated.* 




c  J 




Figure  2.    Feedback  smoothing  in  two  coordinates. 


It  was  mentioned  earlier  that  changing  the 
data-smoothing  network  with  the  target  coor- 
dinates represented  one  way  in  which  the  re- 
sults obtained  from  fixed  networks  could  be 

d  An  exception  to  this  statement  must  be  made  for 
errors  in  the  coordinate  converters  which  fluctuate 
rapidly  with  target  position. 




generalized.  In  a  sense,  the  coordinate  conver- 
sions of  Figure  1  are  illustrations  of  these 
possibilities.  A  better  illustration,  howe.dr,  is 
provided  by  the  circuit  of  Figure  3.  Thv  struc- 

Figure  3     Feedback  smoothing  with  smoothing 

variable  v.  ;  h  pv^iioti  coordinates. 

ture  is  intends  to  give  smooth  slant  range 
rate  from  slant  range  lata,  under  the  assump- 
tion of  unacceierated  straight  line  target 

The  relation  between  input  and  output  in 
Figure  3  is  readily  seen  to  be  • 

'"at"  -4  '»'•>] 


M^(/)IJ  +  1=^  (4) 

where  ^  is  the  amplifier  gain,  D  is  slant  range, 
and  V  =  dD/dt  is  slant  range  rate. 

The  principle  of  the  circuit  depends  upon  the 
fact  that  under  the  assumed  target  motion  the 
square  of  the  slant  range,  D2,  should  be  a 
quadratic  function  of  time,  so  that  [D  (dD/dt)] 
should  be  a  linear  function  of  time  and  (d/dt) 
[D  (dD/dt)]  should  be  a  constant.  This  last  is 
the  quantity  which  is  fed  back  in  Figure  3. 
If  it  actually  is  a  constant,  it  has  no  further 
influence  on  the  calculation,  since  the  forward 
circuit  includes  a  differentiator,  and  the  opera- 
tion of  the  circuit  is  the  same  as  though  no 
feedback  term  were  present.  This  can  be  verified 
by  setting  D  =  D0  =  \/a  +  2bt  +  ct\  corre- 
sponding to  ideal  straight  line  flight,  in  equa- 
tion (4).  It  is  readily  seen  that  the  equation  is 
satisfied  by 

ft  +  <*  dl)0 

V  =  To  = 

Va  +  2bl  -r  Ct* 


the  first  or  feedback  term  being  zero. 

If  D  does  not  correspond  exactly  to  straight 
line  Alight,  either  because  of  tracking  errors 
or  actual  target  maneuvers,  on  the  other  hand, 
the  feedback  voltage  is  no  longer  constant.  In 
this  case  transmission  around  the  loop  can 
exist  and  the  degenerative  feedback  action 
produces  smoothing  in  both  the  input  and  the 
output  voltage.  In  calculating  the  exact  effect 
we  must  take  account  of  the  fact  that  the  feed- 
back voltage  depends  upon  the  D  potentiometer 
in  the  feedback  circuit  as  well  as  upon  the  out- 
put voltage  V.  Since  the  D  potentiometer  set- 
ting must  include  the  errors  in  the  input  data, 
this  means  that  the  output  voltage  is  not  per- 
fectly smoothed,  even  with  unlimited  gain 
around  the  loop.  The  percentage  error  in  the 
output  rate  tends  in  the  limit  to  approximate 
the  percentage  error  in  D  itself.  For  practical 
purposes,  however,  this  is  a  very  satisfactory 
result,  since  in  the  absence  of  smoothing  per- 
centage errors  in  rates  are  usually  many  times 
those  of  the  corresponding  coordinates. 

It  is  apparent  that  it  should  be  possible  to 
construct  many  circuits  of  this  general  type 
from  the  differential  equations  of  the  trajec- 
tory. A  second  example  is  furnished  by  Figure 
4.  The  operation  of  the  circuit  is  essentially 

•  •  DAcosE 


•The  condensers  in  Figure  3  symbolize  differentia- 

Figure  4.  Another  example  of  feedback  smooth- 
ing with  smoothing  variable  with  position  coordi- 

similar  to  that  of  Figure  3.  It  depends  upon 
the  fact  that  in  unaccelerated  straight  line 
motion  the  quantity  D2A  cos2  £  is  a  constant. 
Instead  of  multiplying  by  D2  and  cos2  £  at  a 
single  point  in  the  feedback  loop,  however, 
separate  multiplications  by  D  and  cos  E  are 
introduced  in  the  forward  and  feedback  cir- 
cuits. This  permits  the  output  to  appear  as  a 
smoothed  value  of  the  quantity  DA  cos  E, 



which  will  be  recalled  as  one  of  the  primary 
quantities  in  the  circuit  of  Figure  1. 


In  addition  to  making  the  parameters  of  the 
data-smoothing  network  vary  as  functions  of 
the  coordinates  of  target  position  we  may  also 
make  them  variable  as  functions  of  time.  The 
advantage  of  variation  with  time  can  be  under- 
stood by  going  back  to  the  discussion  of  the 
analytic  arc  assumption  and  its  consequences 
for  fixed  data-smoothing  networks,  as  given  in 
Chapters  9,  10,  and  11.  It  will  be  recalled  that 
for  any  given  settling  time  there  was  an  opti- 
mum choice  of  the  network's  weighting  func- 
tion. The  choice  of  the  settling  time  itself,  how- 
ever, was  always  a  compromise.  On  the  one 
hand,  making  the  settling  time  too  short  led 
to  too  little  smoothing,  so  that  the  dispersion 
in  the  resulting  fire  became  excessive.  On  the 
other  hand,  too  long  a  settling  time  meant  that 
data  from  previous  unrelated  segments  were 
retained  in  the  smoothing  circuit  during  too 
large  a  proportion  of  an  average  individual  seg- 
ment of  the  target  path,  leaving  too  small  a 
residue  of  the  average  segment  as  useful  firing 

It  is  evident  that  it  is  theoretically  possible 
to  escape  the  consequences  of  this  compromise 
by  resorting  to  variable  structures.  We  need 
merely  assume  that  the  network  always  has  a 
weighting  function  appropriate  for  a  settling 
time  equal  to  the  time  since  the  last  change  in 
course.  This  would  give  a  small  amount  of 
smoothing  shortly  after  a  change  in  course, 
with  more  smoothing  and  consequently  greater 
accuracy  later  on.  No  firing  time,  however,  is 
sacrificed  waiting  for  the  network  to  settle. 

In  order  to  exploit  these  possibilities  we 
must,  of  course,  be  able  to  design  networks  to 
give  at  least  approximately  the  right  sequence 
of  weighting  function.  It  is  also  necessary  to 
provide  some  sort  of  auxiliary  controlling 
mechanism  which  will  sense  changes  in  target 
course  and  return  the  variable  circuits  in  the 
smoothing  network  proper  to  their  initial  posi- 
tions. These  are  both  difficult  problems  which 
.iave  been  incompletely  explored.  Some  elemen- 
tary solutions,  based  principally  upon  modifica- 
tions of  the  degenerative  feedback  smoothing 

circuit  of  Figure  2,  of  Chapter  10,  are,  how- 
ever, given  later  in  the  chapter.  As  a  prelimi- 
nary, the  next  section  gives  a  formal  extension 
of  the  general  polynomial  expansion  method  of 
Chapter  11  to  the  variable  case. 


The  extension  of  the  general  method  of 
Chapter  11  to  the  variable  case  requires  two 

1.  The  lower  limit  of  the  integral  to  be 
minimized  is  now  taken  as  zero,  in  anticipation 
of  the  possibility  of  discriminating  between  rele- 
vant and  irrelevant  data  on  the  basis  of  time  of 

2.  The  weighting  function  may  now  depend 
more  generally  upon  the  variable  of  integration 
and  the  upper  limit  of  integration. 

With  these  modifications  there  is  no  longer 
any  advantage,  in  conducting  the  analysis  in 
terms  of  the  age  variable  t.  To  deal  directly 
with  the  minimization  of  the  integral 

jf  \E(\)  -  ig(X)}«  B'o(/,X)  rfX  ,  (5) 


E(\)  =  Vo  +  Vi-  G,«,X)  +  •  •  •  +  Vm  •  Gn(t,\),  (6) 

Where  Gm(t,k)  is  an  mth  degree  polynomial  in 
A.  Also,  let 

£  w0(t,\)  d\  =  i 

jf  G,(/,X)  ■  Gm(t,\)  ■  W0(t,\)  d\  =  0      if  I  *  m 

"  T.  in  =  m 

(Go  =  1,  Ar0  =  1)  . 

Then  (5)  is  a  minimum  with  respect  to  the 
Vm's  in  (6)  if 

Vm(t)  =J^lE(\)-Wm(t,\)d\  (7) 


Wm(i,\)  =  kmGm(t,\)  •  W0(t,\)  .  (8) 

The  possibility  of  physically  realizing  the 
Vm(t)  depends  upon  the  possibility  of  realizing 
networks  with  impulsive  admittances  Wm(t^) 
in  the  sense  that  Wm{t,k)  is  the  response  of  a 




network,  at  time  t,  to  a  unit  impulse  applied  at 
time  A,  where  0  <  A  <  t.  Taking  this  possibility 
for  granted,  the  predicted  value  E(t  +  t,)  is, 
according  to  (6),  a  variable  linear 
of  the  Vm{t),  viz., 

Kit  +  t/) 


Wit)  +  d(M  +  ii)  ■  Vv(i)  +  ■ 

+  Gn(t,t  +  y  •  v.(t). 

It  is  clear  that  all  of  the  Wm(t,\)  as  well  as 
all  of  the  Gm(t,\)  for  m  =  1,  2,  .  .  .  are  deter- 
mined by  W0(t,\).  The  latter  is  determined  as 
the  best  weighting  function  for  position  data 
smoothing,  depending  upon  the  characteristics 
of  the  noise  associated  with  the  position  data. 
The  general  methods  of  determining  the  best 
weighting  function  with  fixed  smoothing  time, 
described  in  Chapter  10,  may  be  used  to  deter- 
mine the  best  weighting  function  with  variable 
smoothing  time. 

Under  the  assumption  that  the  spectrum  of 
the  noise  associated  with  the  signal  5(0  has  a 
uniform  slope  of  6k  do  per  octave,  we  may  take 
over  from  Section  11.3  the  result  that  the  best 
weighting  function  is 

-«-JW![i(l<-W  (,0) 

0  £  X  £  I . 
The  response  of  the  network  is  then 


S(X)  •  wk{t,\)  rfX 



It  will  be  illuminating  to  consider  a  few 
special  cases  of  (11). 
For  k  =  0,  we  have 

V(D  =  |  jfs(X)dX. 


Multiplying  through  by  t  and  differentiating 
we  get 

tV(t)  +  V(t)  =  5(0  .  (13) 

This  suggests  the  circuit  shown  in  Figure  5.f 
For  k  =  1,  we  have 


t*  Jo 

S(X)  •  \(t  -  X)  rfX  . 

Multiplying  through  by  t3  and  differentiating 
twice  we  get 

Irv  +  IV  +  V  =  S 
which  may  be  written  in  the  form 

This  suggests  the  network  shown  in  Figure  6.« 



By  generalizing  the  above  results  in  various 
ways  a  large  number  of  other  examples  of 
variable  smoothing  networks  can  be  constructed. 
Since  unlimited  variation  in  the  smoothing 
time  is  not  practically  possible,  or  perhaps  even 
tactically  optimal,  however,  it  is  desirable  in 
discussing  any  further  examples  to  include  also 
the  possibility  that  the  range  of  variation  in 
the  network  may  be  restricted.  For  any  posi- 
tive integral  value  of  k  in  (11)  the  differential 
equation  for  V(t)  is  of  the  type  which  may  be 
reduced  by  the  transformation  t  =  e*  to  a  linear 
differential  equation  with  constant  coefficients.11 
In  general,  this  facilitates  the  determination  of 
what  happens  to  the  weighting  function 
wk(t,A)  when  t  >  T  if  the  variability  of  the 
network  is  stopped  at  time  T.  In  the  case  of  the 
first-order  equation  (13),  however,  it  is  just 
as  easy  to  deal  directly  in  terms  of  the  natural 

A  more  general  form  for  (13),  which  readily 
yields  the  effects  of  a  sudden  or  gradual  stop- 
page of  the  variability  of  the  network,  is 


V(t)  +  V(t)  =  5(0 


This  corresponds  to  the  response 
whence  the  weighting  function  is 

w(t,\)  = 




'  This  circuit  is  due  to  S.  Darlington. 

«  Due  to  B.  T.  Weber. 

"See  Section  A.ll  for  a  more,  general  transforma- 




The  general  relation  (14)  may  be  realized 
with  the  network  of  Figure  5,  by  varying  the 
resistance  in  accordance  with 

R  m  1<K0 

t  >  0  . 

However,  a  more  practical  circuit  results  from 
the  introduction  of  variable  potentiometers'  in 
both  the  capacity  and  resistance  paths  of  the 

C=4=  V(t) 

Figure  5.   Time-variable  smoothing  circuit  giving 
uniform  weighting  function. 

original  feedback  smoothing  circuit  of  Figure 
2,  Chapter  10.  This  is  shown  in  Figure  7.'  It 
may  be  noted  that  the  feedback  circuit  is  also 
applicable  to  the  two  cases  discussed  in  the 
preceding  section.  It  has  the  advantage  for 
these  applications  that  it  does  not  require  the 
zero-impedance  generators  and  infinite-imped- 
ance loads  of  Figures  5  and  6. 

This  example  obviously  calls  for  a  linear  poten- 
tiometer in  the  condenser  path  and  a  switch  in 
the  resistance  path.  The  weighting  function  ob- 
tained is,  by  (15), 

u>(*,"X)  -  -    0  <  \  <  t  <  T 

j,  e-^/r  o  <  X  <  T  <  t 
1  e-«-wr   0  <  T  <  X  <  t 

Figure  7.  Limited  range  time-variable  feedback 
smoothing  circuit. 

S(1)A  C, 

D  ,J_ 

C,=J=  V(t) 


Figure  €.  Time-variable  smoothing  circuit  giv- 
ing parabolic  weighting  function. 

As  an  example  of  (14)  we  may  take 

*(0  =  t   0  <  t  <  T 
=  re"-™    t  >  T  . 


J(0   =/  0<t<T 
=  T   t  >  T  . 
Hence,  in  Figure  7,  if  RC  =  T 

fc(t)  =  j,     fa(t)  =0    0  <t  <  T 
=  1  =  1    t  >  T  . 

1  In  aome  cases  a  variable  potentiometer  may  turn 
out  to  be  a  switch. 

J  This  circuit  is  due  to  S.  Darlington. 

This  is  illustrated  in  Figure  8  for  T=  10,  t  =  5, 
10,  20. 



t  =  5 
t  =  IO 



10            15  20 

Figure  8.  First  example  of  weighting  function 
produced  by  circuit  of  Figure  7. 

A  second  example  is  furnished  by  taking 

<t>(t)  =  ik    0  <  t  <  T 
=  7*e*«-T>/T  t  >  T . 



k  0  <  1  <  T 





Hence  in  Figure  7,  if  RC      T  k. 

The  weighting  function  obtained  is,  by  (15), 

frit)  =   T  fud)  =  1      lk     (i  <  i    .  T 

=  1 

1    i  >  T 

wCt,\)  = 


The  first  example  is  a  special  case  of  this  one. 
The  weighting  function  obtained  is,  by  (15), 


u»(/,x)  =  — -j—    o  <  x  <  /  <  r 

■       c  -*('-r)/r  o  <  x  <  t  <  / 

=  ^  e  -*('-M/r    o  <  T  <  X  <  /  . 

This  is  illustrated  in  Figure  9  for  k  -  3/2, 
71  -  10,  t  =  5,  10,  20. 

0  <  X  <  *  <  7 


7  xV  e"2l'"T)  T     0  <  x  <  T  <  1 

V  ~2f) 

e-2(i-y)/T      0  <  T  <  \  <  t  . 

This  is  illustrated  in  Figure  10  for  T  =  10, 
t  =  5,  10,  20. 

k  =  i  T=I0 

Figure  9.  Second  example  of  weighting  function 
produced  by  circuit  of  Figure  7. 

A  third  example  is  furnished  by  taking 


0  <  /  <  T 

TV  *«-T)  r    ,  >  7' 

Figure  10.  Third  example  of  weighting  function 
produced  by  circuit  of  Figure  7. 

A  fourth  example  is  furnished  by  taking 

4><t)  -  c*  -  1     <  >  0  . 



57, i>o. 

Hence,  in  Figure  7,  if  f?C  =  1/k, 

fc(t)  =  /*(0  =  1  -  e~kt    t>  0  . 
The  weighting  function  obtained  is,  by  (15), 


w(t,\)  = 

1  -  e 


e-*d-x)     o  <  X  <  t 

<t>a)      \  2/7 

For  any  value  of  t  this  weighting  function  is 
exponential  in  x. 




Hence,  in  Figure  7,  if  RC  -  7/2, 
/r(fl  =  |(l      ^)    /*(»  =  -,{.    0  <  /  <  T 

=  1  =  1     /  >  T . 


Because  there  has  been  no  demand  for  varia- 
ble networks  in  the  field  of  communications, 
the  technique  of  designing  practical  variable 
networks  is  in  a  very  rudimentary  stage  com- 
pared to  that  of  designing  fixed  networks.  In 
the  remainder  of  this  chapter  we  shall  describe 



some  of  the  circuits  which  have  been  developed 
for  specific  practical  applications. 

A  memory  point  method  of  obtaining 
smoothed  rates,  based  upon  (12),  is  illustrated 
below.  If  S(t),  the  quantity  to  be  smoothed, 
lepresents  the  time  derivative  E(t)  of  the  posi- 
tion data  E(t),  then  the  average  rate  is  given 

Coder  the  assumption  that  the  position  data, 
aside  from  tracking  errors,  is  a  linear  function 
of  time,  the  average  rate  is  also  the  smoothed 
rate.  If  the  position  data  is  represented  by  the 
angular  displacement  of  a  shaft  in  the  com- 
puter, the  quantity  £"(0)  is  readily  fixed  by 
providing  a  second  shaft  which  is  coupled  to 
the  first  shaft  until  t  -  0  when  the  coupling  is 
broken.  Potentiometers  mounted  on  the  shafts 
are  energized  by  a  voltage  varying  as  a  func- 
tion of  time  in  the  manner  indicated  in  Figure 
11.  The  manner  in  which  the  smoothed  rate  is 
obtained  is  clear 

Fibi'iit  11.  Memory  point  method  of  obtaining 
smoothed  rate. 

The  memory   point  method  of  obtaining 

iuothed  rates  is  used  in  the  T15  antiaircraft 
director.4  In  this  application,  however,  it  is 
somewhat  more  complicated  than  in  the  simple 
illustration  described  above.  This  is  due  to  the 
fact  that  the  position  data  and  the  memory 
point  are  in  the  polar  coordinate  system, 
whereas  the  rate  components  are  referred  to 
a  tilted  and  rotating  rectangular  coordinate 
system  which  is  determined  by  the  instanta- 
neous llllr  of  sight 

Figure  12,  shows  a  way  of  securing  variable 
smoothing  in  a  purely  electrical  circuit  *  Except 
for  the  fact  that  the  division  of  the  current 
through  the  condensers  is  varied  discontinu- 

FiGURE  12.    Specific  limited  range  time-variable 
feedback  smoothing  circuit. 

ously  instead  of  continuously,  this  circuit  cor- 
responds to  the  first  or  the  second  example  dis- 
cussed in  Section  14.7. 

Figure  13  shows  the  variable  smoothing  cir- 
cuit 1  for  smoothing  first  derivatives  in  the 
M9A1-E1  antiaircraft  director.8  This  circuit 


Figure  IS.    Another  specific  limited  range  time- 
variable  feedback  smoothing  circuit. 

corresponds  approximately  to  the  second  exam- 
ple of  the  differential  equation  (14)  given 
above.  The  variable  element  is  a  thermistor 
which  is  heated  up  to  a  high  temperature,  prac- 
tically instantaneously,  by  the  heater,  and  then 

k  This  circuit  is  due  to  S.  Darlington. 
1  Developed  by  R.  F.  Wick. 




allowed  to  cool  off  naturally.  By  choosing  the 
electrical  and  thermal  constants  in  the  circuit 
correctly  the  resulting  smoothing  can  be  made 
to  approximate  that  obtained  in  a  memory 
point  circuit. 

As  noted  earlier,  all  these  variable  circuits 
require  some  auxiliary  control  means  to  reset 
the  variable  circuits  to  zero  whenever  a  new 
target  is  engaged  or  the  current  target  makes 
a  sudden  change  in  course.  In  the  T15  memory 
point  system  this  function  was  performed  by  an 
operator.  The  operator  was  aided  by  a  series  of 
meters  which  compared  the  instantaneous 
memory  point  rates  with  average  rates  set  in 
some  time  previously  by  hand.  The  visual  in- 
dication of  a  change  in  course,  calling  for  the 
selection  of  a  new  memory  point,  was  a  rela- 
tively large,  smoothly  and  decisively  varying 
deflection  on  the  meters.  In  contrast,  normal 
tracking  errors  appeared  as  relatively  small 
random  fluctuations  of  the  needles.  The  circuits 
of  Figures  7  and  12,  which  were  intended  for 
bombsight  applications,  were  also  under  the 
control  of  an  operator,  who  was  supposed  to 
start  the  mechanism  at  the  beginning  of  each 
bombing  run. 

Two  control  methods  were  used  for  the  cir- 
cuit of  Figure  13.  In  one,  large  changes  in  rate, 
corresponding  to  probable  changes  in  target 

course,  were  distinguished  by  comparing  the 
instantaneous  value  of  the  target  rate,  as  ob- 
tained directly  from  a  differentiator,  with  the 
smoothed  value  obtained  at  the  output  of  the 
smoothing  circuit.  In  the  other  method,  equiva- 
lent information  was  obtained  by  again  differ- 
entiating the  instantaneous  value  of  the  target 
rate,  making  a  second  derivative  of  the  target 
coordinate.  In  either  case  this  rate  difference 
or  second  derivative  information  was  used  to 
control  a  gas  tube,  which  went  off,  supplying 
heating  current  to  the  variable  thermistor, 
whenever  the  voltage  applied  to  it  exceeded  a 
certain  threshold.   This  threshold  evidently 
marks  the  minimum  change  in  course  for  which 
the  variable  network  will  be  reset.  In  order  to 
permit  the  use  of  a  low  threshold,  without 
making  the  circuit  unduly  liable  to  false  opera- 
tion because  of  the  effect  of  tracking  errors, 
the  gas  tube  input  voltage  was  first  transmitted 
through  a  low-pass  filter  which  suppressed 
most  of  the  energy  due  to  tracking  errors.  A 
considerable  amount  of  work  was  done  on  the 
proportioning  of  this  filter  to  provide  the  best 
protection  against  false  operation  with  a  low 
threshold  and  with  minimum  delay  in  resetting 
in  case  a  change  of  course  actually  does  occur, 
but  the  problem  remains  an  interesting  subject 
for  research. 



THIS  APPENDIX  GIVES  a  summary  of  linear 
network  theory  which  is  pertinent  to  the 
analysis  and  design  of  data-smoothing  and 
prediction  circuits.  It  is  incomplete  in  many 
respects  and  should  therefore  be  supplemented 
by  reference  to  established  textbooks  on  the 
subject.  However,  it  contains  some  results 
which  are  new. 

The  present  summary  will  be  concerned 
mainly  with  fixed  linear  networks.  Variable 
linear  networks  will  be  considered  briefly  in 
the  last  section. 


A  fixed  linear  transmission  network  is  one  in 
which  the  response  V(t)  is  related  to  the  im- 
pressed signal  E(t)  by  a  linear  differential 
equation  of  the  form 

b'dW+bn-idJiy^  +     +  M' 

dmE  dm'lE 

with  constant  coefficients.  It  is  well-known  that 
the  solutions  of  such  a  differential  equation 
obey  the  "superposition  principle."  This  makes 
it  possible  to  formulate  the  response  of  the  net- 
work to  any  signal,  in  terms  of  its  response  to 
certain  standard  signals. 

A  convenient  standard  signal  for  analytical 
purposes  is  the  "unit  impulse."  It  may  be  re- 
garded as  the  limit  of  the  rectangular  pulse 
shown  in  Figure  1  as  the  duration  of  the  pulse 

»  i  1 

Figure  1.    Rectangular  puise  signal. 

is  decreased  indefinitely  while  the  amplitude  is 
increased  in  such  a  way  that  the  area  under 
the  pulse  is  always  unity.  The  limiting  function 
thus  denned  does  not  exist  in  a  strict  mathe- 
matical sense.  However,  it  is  very  convenient 
for  analytical  purposes,  and  seldom  leads  to 
difficulties,  to  proceed  as  though  the  limiting 
function  did  exist.  An  impulse  occurring  at 

t  =  a  is  conventionally  denoted  by  the  singular 
function  Su(t  —  A)  where 

«o(t)  =  0   if  r  ^  0 
J  ha(r)dr  =0    if  t  <  0 
si     if  t>  0 

The  response  of  a  fixed  network  to  an  im- 
pulse or  any  form  of  signal  is  independent  of 
the  time  at  which  the  signal  is  applied,  provided 
it  is  expressed  as  a  function  of  the  time  relative 
to  the  application  of  the  signal.  Let  W(t)  be 
the  response  to  the  signal  &0(t).  This  is  called 
the  "impulsive  admittance"  of  the  network. 
Physically,  it  must  be  identically  zero  for  nega- 
tive values  of  t.  For  an  impulse  applied  at  t  =  A 
the  response  will  therefore  be  W(t  —  A),  which 
is  identically  zero  for  t  <  A. 

A  physical  signal  E(t)  such  as  the  one  shown 
in  Figure  2  may  be  resolved  into  an  infinite 

Figure  2.   Derivation  of  superposition  theorem. 

succession  of  elementary  impulses.  The  strength 
of  the  typical  elementary  impulsive  component, 
such  as  the  one  shown  in  Figure  2  as  occurring 
at  time  A,  is  E(\)d\.  Its  contribution  to  the 
response  at  time  t  is  E(\)-W(t  —  A) dk.  Hence 
the  contribution  of  all  the  elementary  impulsive 
components  of  the  signal,  to  the  response  at 
time  t,  is  given  by  the  formula" 

V{t)  =  f  +  E{\)  ■  W(t  -  A)d\  (2) 

This  is  one  form  of  the  "superposition  theo- 
rem" for  fixed  linear  networks. 

Before  discussing  the  reasons  for  the  limits 
of  integration  indicated  in  (2),  it  will  be  help- 
ful to  consider  a  graphical  interpretation  other 
than  the  one  used  in  deriving  the  integral.  Let 
W(t)  be  of  the  form  shown  in  Figure  3,  and  let 
^(A)  be  of  the  form  shown  in  Figure  4.  To 
determine  the  response  V(t)  at  a  given  value 
of  t,  the  curve  in  Figure  3  is  turned  over  from 





right  to  left  and  placed  over  the  curve  in  Fig- 
ure 4  so  that  its  right-hand  edge  is  at  A  -  t.  The 
product  of  the  two  curves  gives  a  third  curve 
(not  shown),  which  is  identically  zero  for  all 
.  >  t.  The  area  under  the  third  curve  is  the  re- 

I — L-W(t) 

FlGl'RE  3.  An 

impulsive  admittance 

sponse  V(t)  at  the  given  value  of  t.  For  pro- 
gressively larger  values  of  t,  the  curve  repre- 
senting W(t  —  a)  in  Figure  4  is  simply  slid  to 
the  right  with  respect  to  the  curve  represent- 
ing E  (a)  . 


-i     C     I     1  ?  3 

f'ieu*  4.  Graphical  iiiterpif iaUon 
turn  theoiem 

ismee  a  physical  signal  must  certainly  be 
identically  zero  up  to  some  definite  time,  or 
since  it  must  certainly  have  been  applied  to  the 
network  at  some  definite  time,  that  time  could 
be  taken  arbitrarily  as  Zero  and  (2)  could  be 
written  in  the  form 

V®  =  f 



In  this  form,  however,  since 




is  in  general  a  function  of  t,  the  response  cou.d 
not  Oe  interpreted  as  a  weighted  average  of  the 
signal.  On  the  other  hand,  since 

j ^  H',/  -  Ax/A  =  jT  W\r)d7 

is  independent  of  t,  the  response  may  be  inter- 
preted as  a  weighted  average  of  the  signal,  if 

•/,  -  1 

1  h: 

-ce.->sity  of  taking  tiie  lower  limit  in  f2i 
j    in  order  t"  permit  the  interpretation 
of  the  response  as  a  weighted  average  of  the 

signal,  is  also  expressed  by  the  pi»iu1  of  view 
that  a  hxed  network  cannot  make  any  ,/n/sical 
distinction  between  having  no  applud  signal 
and  having  an  applied  signal  which  happens  to 
be  of  zero  amplitude. 

Another  shortcoming  of  the  form  i'Ai  or,  for 
that  matter,  of  the  form  (2)  if  we  set  t  as  the 
upper  limit  of  integration,  comes  from  the  con- 
sideration of  impulsive  admittances  of  such  a 
nature  that  Wit  -  A)  has  certain  kinds  of  sin- 
gularities at  a  —  t.  For  example,  the  case  for 
direct  transmission,  expressed  in  the  form 


/;  > 

(A*  •  S0(t  -  A),7A 

is  ambiguous  because  the  singularity  in  the 
integrand  occurs  exactly  at  one  end  of  the 
range  of  integration.  However,  the  form 


A I  •  bn't   —  Av/A 

leads,  without  ambiguity,  to  the  result 
V  (t)  --  E(f) .  This  example  is  not  trivia!.  Every 
network  which  transmits  infinite  frequency 
must  have  an  impulsive  admittance  of  such  a 
nature  that  WU  \)  contains  a  singularity  of 
the  I'm n,  &,.('  a).  Any  attempt  to  rule  out  such 
a  singularity  on  the  ground  that  physical  net- 
works cannot  in  fact  transmit  infinite  fre- 
quency, complicates  the  analysis  and  design  of 
networks  unduly.  If  a  network  is  capable  of, 
or  is  expected  to  transmit  frequencies  at  the 
top  of  the  range  of  interest  or  importance,  it  is 
simpler  to  assume  that  the  network  is  capable 
of,  or  is  expected  to  transmit  all  frequencies 
above  that  range. 

One  other  advantage  of  taking  the  limit 

s  of 

integration  as  indicated  in  (2)  may  be  called 
to  attention  Keeping  in  mind  that  /-.'(a)  is 
identically  zero  for  all  values  of  A  below  some 
definite  though  perhaps  unknown  value,  and 
that  Wit  ai  is  identically  ,tro  for  all  values 
of  a  t,  it  is  viear  that  (2)  may  be  integrated 
partially  any  number  of  times  without  incur- 
ring the  burden  of  carrying  a  string  of  iff  ins 
outside  of  the  integral.  Af?«r  one  pamai  inte- 
gration we  have 



.1  ;/ 

Sine  £  i  a,  ..<  identic.  !:>  .  ],„  ai.  ,.,:„,..  0f 
.-.  in  vM-.n-h  Eix)  >  :ienti«all>  zer.    ...itd  *inee 

LONHDL.Ml  \1 


A(t  -  A)  is  identically  zero  for  all  values  of 
A  >  t,  a  second  partial  integration  may  be  per- 
formed with  no  more  formal  complication  than 
the  first  partial  integration.  The  fact  of  the 
matter  is  that  the  terms  which  ordinarily  arise 
in  partial  integrations,  outside  of  the  integral, 
are  here  carried  under  the  integral  by  singulari- 
ties of  the  integrand. 

The  superposition  theorem  in  the  i^rm  (4) 
may  be  derived  directly  in  a  manner  similar  to 
the  derivation  of  (2).  A(t  -  i)  is  the  response 
of  the  network  to  a  Heav;  ..e  unit  step  func- 
tion H(t  —  a)  applied  at  t     A,  where 

H(1  -  X)  m  0     when  t  <  X 

=  1      when  t  >  A  . 

The  signal  is  resolved  into  an  infinite  succes- 
sion of  elementary  step  functions  of  amplitude 
E'{k)dk  wherever  E(k)  is  continuous,  and 
finite  step  functions  of  amplitude  dE(k)  wher- 
ever £"(a)  has  a  finite  discontinuity.  The  con- 
tribution of  each  elementary  step  function  to  the 
response  at  time  t  is  E'  (k)  A(t  —  k)dk,  that 
of  each  finite  step  function  is  A  (t  -  A)  •  dE(k). 
Hence,  the  response  is  given  formally  by  (4) 
with  the  understanding  that  E'(k)dk  is  to  be 
interpreted  as  dE(k)  wherever  E(k)  is  discon- 

The  response  A  (t)  of  the  network  to  a 
Heaviside  unit  step  function  H(t)  applied  at 
t  —  0  is  called  the  "indicial  admittance"  of  the 
network.  It  is  more  familiar,  in  the  field  of 
linear  transmission  theory,  than  the  impulsive 
admittance  to  which  it  is  related  by  (5),  but  in 
this  monograph  preference  is  given  to  the  use 
of  the  impulsive  admittance.  In  the  theory  of 
linear  differential  equations  the  impulsive  ad- 
mittance is  known  as  a  Green's  function. 

It  is  often  convenient  to  express  the  response 
so  that  the  variable  of  integration  represents 
the  age  of  the  elementary  components  of  the 
signal.  Introducing  the  age  variable 

r  =  t-  A  (0) 

into  (2),  we  have 

F(0  =  £*FAt-T)  ■  W(r)dr.  (7) 

•Formula  (4)  may  be  written  in  the  Stieltjes  form 
V(t)=  I  A(t-\)aE(\). 

Alternatively,  we  may  take  the  point  of  view  that 
E'(A)  contains  impulsive  singularities  wherever  E(\) 
is  discontinuous.  This  point  of  view  is  generalized  in 
Appendix  B. 

In  this  form  it  is  clear  that  the  weighting  of 
signal  components  is  on  the  basis  of  age  only. 
A  fixed  network  may  be  said  to  have  a  memory 
which  is  a  function  only  of  the  age  of  past 

In  the  preliminary  stages  of  designing  a 
smoothing  network,  the  weighting  function 
W(T)  is  generally  prescribed  to  be  identically 
zero  when  t  >  T  say,  as  well  as  when  t  <  0. 
This  does  not  violate  the  conditions  of  physical 
readability.  However,  such  a  weighting  func- 
tion cannot  be  obtained  exactly  with  a  network 
of  a  finite  number  of  discrete  impedance  ele- 
ments. A  finite  network  invariably  yields  a 
weighting  function  with  a  "tail"  which  extends 
to  infinity. 


Theoretically,  the  impulsive  admittance  of  a 
prescribed  network  may  be  determined  directly 
from  the  differential  equations  of  the  network 
in  a  perfectly  straightforward  manner.  Prac- 
tically, however,  it  is  very  difficult  to  do  so  if 
the  network  has  more  than  two  meshes.  Fur- 
thermore, the  technical  problem  of  designing 
a  network  directly  from  a  prescribed  impulsive 
admittance  is  even  more  difficult,  particularly 
if  the  impulsive  admittance  is  not  exactly  re- 

These  difficulties  may  be  avoided  by  recourse 
to  the  highly  developed  methods  of  network 
analysis  and  synthesis  used  in  the  field  of  com- 
munication circuits.  These  methods  are  based 
upon  the  steady-state  properties  of  networks. 

If  a  signal  consisting  of  the  single  sinusoid 
cos  <i>£  is  applied  to  an  invariable  or  fixed 
linear  transmission  network,  the  steady-state  re- 
sponse" will  also  be  a  single  sinusoid  of  the 
same  frequency.  The  amplitude  and  phase  of 
the  response,  relative  to  the  signal,  will  in 
general  depend  upon  the  frequency.  The  re- 
sponse may  be  regarded  as  the  resultant  of  an 
"inphase  component"  proportional  to  cos  o>£, 
and  a  "quadrature  component"  proportional  to 
sin  U,  with  amplitude  coefficients  which  are 
functions  of  the  frequency.  Furthermore,  since 
the  signal  is  an  even  function  of  the  frequency, 
the  response  should  also  be  an  even  function 
of  the  frequency.0  Hence,  the  response  will 

"  This  is  the  response  apart  from  transient  compo- 
nents, assuming  that  the  latter  vanish  exponentially 
with  time  after  the  signal  is  impressed. 

c  The  signal  is  also  an  even  function  of  the  time  but 
this  is  due  only  to  the  particular  choice  of  origin  which 
is  arbitrary. 




be  of  the  form  G(w2)  cos  wt  —  wH(w2)  sin  wt, 
where  G  and  H  are  even  real  functions  of  fre- 

By  a  suitable  shift  of  the  origin  of  time  it 
follows  that  if  the  impressed  signal  is  sin  wt, 
the  steady-state  response  will  be  of  the  form 
G(w2)  sin^f  +  o)H(oj')  cos  wt. 

These  two  results  may  be  combined  into  a 
simpler  expression  without  any  loss  of  indi- 
viduality.  Since  eiu>t  -  cos  wt  +  i  sin  wt  where 
i  =  \/  —  1,  we  have 

V(t)  =  '[<?(»*)  -(-  iuH(u')}  ■         if  E(l)  =  e". 

A  further  simplification  may  be  achieved  by  re- 
placing iw  by  p,  and  G(  -  p2)  +  pH{-  p2)  by 
Y{p),  so  that 

V(f)  =  Yip)  ■  e"     if  E{t)  =  e*  .  (8) 

Y  (p)  is  called  the  "steady-state  transmission 
function"  or  just  "transmission  function"  for 

Strictly  speaking,  (8)  expresses  the  relation 
of  steady-state  response  to  signal  only  if  p  =  u>. 
However,  it  is  customarily  called  a  steady-state 
relation  even  when  p  is  not  a  pure  imaginary 
quantity.  It  may  be  noted  that  Y(p)  is  real 
when  p  is  real. 

The  simplicity  of  steady-state  analysis  de- 
rives from  the  fact  that  time  occurs  in  the 
signal  and  throughout  the  network  only  in  the 
form  ept.  In  particular,  the  determination  of 
the  transmission  function  is  reduced  to  the 
solution  of  simultaneous  algebraic  equations 
which  do  not  involve  the  time  factor.  For  a  net- 
work in  which  the  signal  and  the  response  are 
related  by  the  linear  differential  equation  (1) 
with  constant  coefficients,  we  obtain  simply 

KV      6o  +  6,p  +  •  •  ■  +  f>„pB  ' 

It  may  be  noted  that  the  poles  of  the  transmis- 
sion function,  also  referred  to  as  "infinite-gain 
points"  in  the  p-plane,  correspond  to  the  roots 
of  the  characteristic  function  of  the  differential 
equation.  Physical  restrictions  on  the  location 
of  infinite-gain  points  will  be  considered  in  Sec- 
tion A.9. 



A  relationship  between  the  impulsive  admit- 
tance and  the  transmission  function  of  a  net- 

work may  be  obtained  from  (7).  Putting 
E(t)  =  e"  when  t  >  0,  we  get 

V(t)  =  ePtJ^'w(T^  e'*1  dT 
=  e"jT  W(t)  e~*  dr 

W(t)  e-»  dr 


The  second  term  in  (9)  is  a  transient  term  due 
to  the  fact  that  we  have  taken  E{t)  ==0  when 
t  <  0.  The  first  term  in  (9),  which  involves  the 
time  only  through  e"',  is  the  steady-state  term. 
Comparing  this  term  with  (8)  we  get 


W(t)  e~"  dt 


or,  in  the  notation  which  will  be  introduced  in 
the  next  section 


Y(p)  =  L[W{t)\  . 



The  frequent  use  which  is  made  of  the 
Laplace  transform  and  its  inverse,  in  the 
analysis  and  design  of  fixed  linear  networks, 
warrants  a  brief  discussion  of  these  trans- 

Given  a  function  f(t)  which  is  identically 
zero  when  t  <  0,  its  Laplace  transform  g  (p)  is 
defined  by  the  formula 

g(p)  =  Hf(t)] 

f(t)  e-"  dt 


This  is  usually  written  with  0  for  the  lower 
limit,  but  by  having  the  point  t  =  0  inside  the 
range  of  integration,  instead  of  at  the  end,  we 
secure  the  same  advantages  for  (12)  that  we 
gained  in  the  case  of  (2)  by  having  the  point 
k  =  t  inside  the  range  of  integration.  Since  f(t) 
is  identically  zero  when  K0  we  could  write 
—  oo  for  the  lower  limit  in  (12) ,  but  this  would 
run  the  risk  of  confusion  with  the  so-called 
"bilateral  Laplace  transform."  On  the  whole, 
it  is  worth  while  to  have  a  constant  reminder 
that  functions  f(t)  which  are  not  identically 
zero  when  t  <  0  are  ruled  out. 

The  integral  in  (12)  is  usually  not  con- 
vergent for  all  values  of  p.  That  is,  in  order  to 
secure  convergence  of  the  integral,  it  may  be 
necessary  to  assume  R(p)  >a,  where  R(p)  is 
the  real  part  of  p,  and  a  is  a  real  number.  The 




result  of  the  integration  is  a  representation  of 
g(p)  in  the  half-plane  R(p)  >  a.  Since  the 
representation  is  analytic  throughout  the  half- 
plane,  the  principle  of  analytic  continuation 
allows  us  to  extend  the  definition  of  g(p)  to 
the  remainder  of  the  /;-plane. 

Given  a  function  g{p)  which  is  analytic 
throughout  the  half-plane  R(p)  >  c  where  c  is 
a  real  number,  its  inverse  Laplace  transform 
/(f)  is  given  by  the  formula 

f{t)  =  L-'[ff(p)] 

]  fc+ia 

<j{p)  €*<  dp  (13) 

provided  /(f)  is  identically  zero  when  t  <  0. 
If  the  result  of  the  integration  in  (13)  is  not 
identically  zero  when  t  <  0,  g(p)  is  not  a 
Laplace  transform  and  the  application  of  the 
inverse  transformation  to  it  is  meaningless. 

Translation  Theorem 

A  useful  theorem  can  be  established  at  this 
point.  This  is  the  translation  theorem. 

G{p)  =  L[F(t)~\ 


L->[G(p)e  ^  =  F(t  -  a) 

provided  that  F  (f  —  a)  =s  0  when  t  <  0.  Trans- 
lation is  to  the  right  or  left  according  as  a  is 

—  ™ 

positive  or  negative. 

If  it  happens  that  F(f)==0  when  t  <  t0 
where  f0  >  0,  then  the  restriction  is  that 
a>  —  t0.  That  is,  a  limited  amount  of  transla- 
tion to  the  left  is  permissible.  In  general,  f0  =  0 
and  the  restriction  is  therefore  that  a  >  0.  This 
theorem  follows  readily  from  (12)  or  (13). 

In  all  of  the  applications  of  (13)  which  we 
have  any  occasion  to  make  in  the  analysis  and 
design  of  fixed  linear  networks,  the  function 
g(p)  may  be  resolved  into  a  sum  of  terms  of 
the  form  G(p)e-pa  where  a  >  0  and  G(p)  is  a 
rational  algebraic  function  with  real  coeffi- 
cients. Making  use  of  the  translation  theorem, 
the  problem  of  evaluating  L1  [g  (p)  ]  reduces  to 
that  of  evaluating  L-'[G(p)].  Now,  G(p)  may 
be  resolved  into  a  sum  of  terms  of  the  form 
p"  or  l/(p  —  a)m+1  where  m  =  0,  1,  2  -  ••.  We 
shall  consider  these  two  cases  separately. 

The  case  G  (p)  =  p"  will  be  treated  by  means 
of  (12)  and  some  limiting  processes.  In  Sec- 
tion A.l  the  unit  impulse  was  regarded  as  the 
limit  of  a  rectangular  pulse  of  duration  T  and 
amplitude  1/7.  By  means  of  (12)  the  Laplace 

transform  of  such  a 
0  <  f  <  T  is 

over  the  interval 

1  -  tr* 


L  [£,(()]  =  lim  1  -  e->T  _ 

T-*0        pf         -  1  • 

Formally  therefore 

L->  [1]  =  1,(0  (14) 

Similarly,  the  Laplace  transform  of  a  pulse 
over  the  interval  a  <  t  <  a  +  T  where  a  >  0  is 

1  -c-"r 



lim   1  -  e-"r 

Formally  therefore 

L-i  [e-~]  =  &0(t~a)  . 

The  last  result  follows  directly  from  (14)  using 
the  translation  theorem. 
Next,  let 

r-*o  ji 

This  is  the  limiting  case,  as  shown  in  Figure  5, 
of  two  impulses  of  strengths  1/T  and  -1/T 
separated  by  a  time  interval  T.  It  may  be  called 




Figure  5.   An  impulse  doublet. 

an  impulse  of  second  order.  By  (12)  and  the 
previous  results 

L  [1,(0]  -  Km  1  -«-"',  - 
r-»o      f         v  • 

Formally  therefore 

L~l  [p]  -  «,«) . 


Proceeding  in  this  fashion  we  may  define  an 
impulse  of  (m  +  l)th  order  as 

Ut)  =  lim    <— .«)  -  «— i  (t-T) 






and  we  may  then  show  that 

MM')]  =  r. 

Formally  therefore 

L~l  [jr]  «  a.(0 



This  disposes  of  the  case  G(p)  =  pm  where 
m  —  0, 1, 2  •  •  • . 

The  case  G(p)  =  1/  (p  -  a)  "*l  will  be  treated 
by  means  of  (13)  and  Jordan's  lemma. 

Jordan's  Lemma 

If  all  the  singularities  of  G(p)  can  be  en- 
closed by  a  circle  of  finite  radius  with  center  at 
the  origin,  and  if  G  (p)  -*0  uniformly  with 
respect  to  arg  z  as  \z\  ->  oo,  then 

G(p)e*dp]  -  0 

where  r  is  a  semicircle  oi  radius  P,  with  center 
at  the  origin,  to  the  right  of  the  imaginary  axis 
if  t  is  negative,  to  the  left  of  the  imaginary  axis 
if  t  is  positive. 

By  the  use  of  this  lemma  the  contour  of  inte- 
gration in  (13)  may  be  closed  and  the  integra- 
tion may  then  be  performed  by  the  method  of 
residues.  In  the  case 



(p  -  a)-+l 

we  readily  obtain 

where  m  —  0,  1,  2 

[(p  -  a)-+>] 

t  <  0 



/  >  0. 

An  important  special  case  of  (18),  correspond- 
ing to  o  =  0,  is 

J    Lp"+1J  m! 

<  >  0 


Another  useful  theorem  which  is  readily 
established  by  means  of  (12)  and  (13)  is 
Borel's  theorem. 

Borel's  Theorem 

If  0(P),  9Av),  9ii.P)  are  the  Laplace  trans- 
forms of  f(t)t  /,(«),  /,(*),  respectively,  and  if 

g(p)  -  0i(p)  0t(p) 

m  -       " x)  /,(x)dx 

-  £jx{T)-S*{t-r)dr. 

The  functions  /,  (O  and  ft(t)  are  subject  to 
conditions  which  permit  the  inversion  of  the 
order  of  integration  in  the  following  proof. 
However,  these  conditions  are  seldom  of  any 
concern.  We  have 

ftfl  =  L-l{0i(p)  •  L  [/»(*)]} 

Inverting  the  order  of  integration  and  noting 

2x1  Jc-i<r> 

gi(p)tp(,~x)  dp 

0     if  X  >  t 
f(t  -  X)      if  X  <  < 
we  obtain  the  result  stated  in  the  theorem. 


The  result  (8)  obtained  in  Section  A.2  sug- 
gests an  operational  expression  of  the  form 

V®  =  Y(p)  ■  E®  (20) 

for  the  response-to-signal  relationship  what- 
ever the  signal  E{t)  might  be.  If  the  equiva- 
lence of  this  operational  expression  to  (2)  it 
taken  as  a  matter  of  definition  we  may  readily 
discover  the  nature  of  the  implied  operation. 

In  the  light  of  Borel's  theorem,  (2)  may  be 
expressed  in  the  form 

L[V(t)}  =  L\W(»]  •  L\EW] 

under  the  permissible  assumption  that  £(t)«0 
when  t  <  0.  Hence 

V(#)  =  lrx  [LflPOl  ■  L{E(t))\ 

or,  by  (11) 

V(0  =  L~l  \  Y(p)  ■  L[E(t)]\  .  (21) 

This  is,  therefore,  in  general  the  meaning  of 
the  operational  expression  (20)  .4 

o  We  note  that  if  S(p)  =  L\E(t)\,  the  operational 

V(t)  ~  S(p)  ■  W{t) 
U  equivalent  to  (20).  Thii  form  ia  need  in  Section  104 
and  in  Appendix  B. 




The  symmetry  of  the  impulsive  admittance 
is  expressed  by 

W(T  -  t)  =  W(t) 

Since  W(t)  =0  when  t  <  0,  it  must  be  so  also 
when  t  >  T.  Hence 

'     W{t)e~*dt  +  /  W(t)e~*dt. 

By  a  change  of  variable  of  integration  the  sec- 
ond term  may  be  expressed  in  the  form 

W(T  -t)e-*T-»dt 

Assume  that  W(t)  admits  the  series  expan- 

Wit)  =  a0  +  A,t  +  ...  +4;r  +  •••  •  <25) 

771 , 


or,  because  of  the  sj 

W(Qe*  dt . 

Hence,  if  the  first  term  in  Y(p)  be 

W(t)e-*  dt 

we  have 

Y(p)  =  Yy(p)  +  Yi{-p)er+* 

=  [iri(p)epT/2  +  Ki(-p)e-pT/2]  tr*Tn  . 

At  real  frequencies  (p  =  u>)  the  bracketed  fac- 
tor is  evidently  an  even  real  function  of 


•  e-u*r/I. 


Apart  from  discontinuities  in  the  phase  angle 
of  the  transmission  function  at  real  frequencies 
»  for  which  QU2)  is  zero,  the  phase  angle  is 
proportional  to  frequency.  Such  a  transmission 
function  is  referred  to  as  a  linear  phase  trans- 
mission function.  Sinusoidal  components  of  the 
signal,  of  frequencies  less  than  the  lowest  fre- 
quency at  which  Q  (<uJ)  vanishes,  suffer  phase 
retardations  in  transmission  in  proportion  to 
their  frequencies.  These  components  therefore 
contribute  no  delay  distortion.  They  are  delayed 
by  a  uniform  amount,  just  as  they  are  in  a 
properly  terminated  distortionless,  uniform 
transmission  line,  although  in  the  case  of  (24) 
they  contribute  amplitude  or  loss  distortion 
through  Qiw2).  The  delay  in  (24)  is  just  half 
of  the  "smoothing  time"  T. 


Two  useful  series  relationships  between  im- 
pulsive admittances  and  transmission  functions 
will  be  derived  in  this  section. 

for  small  positive  values  of  t.  Then  by  (11) 
and  (19) 


pi     1  '  pmH 

If  A0  0  the  transmission  cannot  drop  off 
faster  than  6  db  per  octave  as  the  frequency 
increases  indefinitely.  If  the  transmission  is  to 
drop  off  ultimately  at  the  rate  of  6fc  db  per 
octave  all  of  the  A's  up  to  and  including  Ak.2 
must  be  zero.  This  is  to  say  that  the  impulsive 
admittance  and  all  of  its  derivatives  of  orders 
up  to  and  including  the  (k  —  2)th  must  vanish 
at  *  =  0. 

Next,  let  us  suppose  that  the  impulsive  ad- 
mittance and  all  of  its  derivatives  of  orders  up 
to  and  including  the  (k  —  2)th  are  continuous 
through  all  values  of  t  including  t  —  0  except 
that  the  (k  —  2)th  derivative  is  discontinuous 
only  at  t  =  a.  We  may  resolve  the  impulsive 
admittance  into  the  sum  W,(t)  +  W2(t)  where 
W1  (t)  and  all  of  its  derivatives  of  orders  up  to 
and  including  the  .  (fc  —  2)th  are  continuous 
through  all  values  of  t  including  t  =  0,  while 
W2(t)  =0  for  all  values  of  t  <  a.  Then,  for 
small  positive  values  oft  —  a 

Ak.i  (t  -  a)*"' 


(k  - 

(Ak.t  *  0) 


Hence  the  transmission  cannot  drop  off  ulti- 
mately faster  than  6(k  —  1)  db  per  octave.  We 
may  summarize  these  results  in  the  asymptotic 
loss  theorem. 

Asymptotic  Loss  Theorem. 

If  the  transmission  is  to  drop  off  ultimately 
at  the  rate  of  6A;  db  per  octave  as  the  frequency 
increases  indefinitely,  the  impulsive  admittance 
and  all  of  its  derivatives  of  orders  up  to  and 
including  the  (k  —  2)th  must  be  continuous 
through  all  values  of  t  including  t  =  0. 

Discontinuities  in  W(t)  or  in  some  deriva- 
tive of  W(t)  cannot  occur  except  at  t  =  0  in 
the  case  of  physical  lumped  element  networks. 
Practically,  however,  rapid  changes  in  W(t) 




or  in  some  derivative  of  W(t),  at  any  value  of 
t,  may  be  expected  to  be  associated  with  much 
the  same  behavior  of  the  transmission  at  rea- 
sonably high  frequencies.  As  an  example  con- 
sider the  case 

W{t)  =  e--  -e-v      (0  >  a  >  0). 
0  -  a 


(p  +  + 

W(t)  is  continuous  through  t  —  0  as  long  as  0 
is  finite  but  becomes  discontinuous  there  in  the 
limit  as  fi-*  ».  The  first  derivative  of  W(t) 
is  discontinuous  through  t  =  0  even  when  0  is 
finite.  The  ultimate  slope  of  the  transmission  is 
12  db  per  octave,  in  accordance  with  the 
asymptotic  loss  theorem,  but  in  the  range 
a  <  w  <  p  the  transmission  appears  to  have  a 
slope  of  only  6  db  per  octave. 

The  importance  of  the  observations  made  in 
the  preceding  paragraph,  in  the  design  of  a 
network,  is  that  if  we  attempt  to  approximate 
a  W(t)  which  has  a  discontinuity  in  a  deriva- 
tive of  lower  order  at  t  =  a  than  at  t  =  0,  the 
fact  that  the  physical  approximation  must  have 
continuous  derivatives  of  all  orders  and  through 
all  values  of  t  except  t  -  0  is  not  very  signifi- 
cant. The  ultimate  slope  of  the  transmission 
may  not  be  reached  until  the  frequency  is  too 
high  to  be  of  any  importance. 

Another  useful  relationship  between  impul- 
sive admittance  and  transmission  function  fol- 


The  transmission  function  Y(p)  of  a  lumped 
element  network  is  a  rational  algebraic  func- 
tion of  p.  It  is  real  for  real  values  of  p  (A.2) . 
Hence,  the  coefficients  must  be  real,  and  there- 
fore the  roots  and  poles  must  either  be  real  or 
occur  in  conjugate  complex  pairs. 

Such  a  function  may  be  expanded  into  the 
sum  of  a  polynomial  and  a  rational  function 
whose  numerator  is  of  lower  degree  than  the 
denominator.  The  latter  may  therefore  be  prop- 
erly expanded  into  partial  fractions.  For  a 
partial  fraction  of  the  form 

— L_      *here)B=l,2  ... 
(p  —  a)" 

the  contribution  to  the  impulsive  admittance 
W(t)  is  by  (18) 

I;  1~- 1  =  ,  »        «  >  0)  . 

L(p  -  a)"J       (m  -  1)! 

For  a  pair  of  partial  fractions  of  the  form 

A  +  iR  A  -  iB 

(p  -  a  +  iff)"  +  (p  -  a  -  iff)m 

the  contril 


to  the  impulsive  admittance  is 

C  (A  cos  fit  +  B  sin  pi)  . 

(m  -  1)! 

Since  the  impulsive  admittance  is  the  re- 
sponse to  an  impulsive  signal  it  is  clear  that  for 
/"»  a  stable  network  the  impulsive  admittance  must 

lows  from  the  assumption  that   /    t-W  (t)  dt    be  free  of  terms  which  increase  indefinitely 

with  time,  either  on  account  of  an  amplitude 

is  finite  for  m  = 
exponential  in 

1,  2    ...  If  we  expand  the 

F(p)  =  /  \\'itu-*,tt 
into  a  power  series  in  pt  we  get 

F(P)  -  M,  -  M ,  p  +  _ 





rW(t)di . 



The  quantity  Mm  is  the  mth  moment  of  the  im- 
pulsive admittance. 

When  M„  =  1  we  speak  of  the  response  of  the 
network  as  a  weighted  average  of  the  impressed 
signal,  and  speak  of  the  impulsive  admittance 
W(t)  as  the  weighting  function. 

factor  of  the  form  eat  where  a  >  0,  or;  in  the 
event  that  a  =  0,  on  account  of  an  amplitude  fac- 
tor of  the  form  fr"-1  where  m  >  1.  Hence,  the 
physical  restrictions  on  the  transmission  func- 
tion are: 

1.  No  poles  with  positive  real  parts. 

2.  Poles  on  the  imaginary  p  axis  must  be 

The  poles  of  a  passive  transmission  function 
correspond  to  modes  of  free  motion.lsh  Each  of 
them  may  be  shownlM  to  satisfy  an  equation  of 
the  form 

pT  +  F  +  -  =  o 

where  T,  F,  V  are  positive  quantities  whose 
values  depend  upon  the  particular  mode  and 

•  Poles  on  the  imaginary  p  axis  must  also  be  ruled 
out  on  the  ground  that  persistent  transients  cannot  be 
tolerated  any  more  than  growir 




its  activity.  However,  T  is  zero  in  the  absence 
of  kinetic  energy,  F  is  zero  in  the  absence  of 
energy  dissipation,  and  V  is  zero  in  the  absence 
of  potential  energy.  It  follows  that  in  the 
absence  of  coils  or  in  the  absence  of  condensers, 
the  transmission  function  must  have  poles  only 
on  the  negative  real  p  axis. 

For  extremely  narrow-band,  low-pass  appli- 
cations, such  as  data  smoothing,  it  is  not  prac- 
ticable to  build  networks  which  call  for  coils 
because  these  generally  turn  out  to  be  of  many 
thousands  of  henries  in  inductance.  The  exclu- 
sion of  coils  from  these  applications  does  not, 
however,  rule  out  transmission  functions  with 
complex  poles.  These  may  be  realized  with  RC 
networks  in  feedback  amplifier  circuits  as  is 
shown  in  Chapter  12. 


A  quasi-distortionless  transmission  network 
is  one  which  is  distortionless  only  in  a  certain 
sense.  This  sense  will  be  made  clear  in  this 



1  +  dip  +  o2p2  +  ■  ■  •  +ampm 

1  +  hp  +  62p2  +  .  .  .  +  bnjj* 


This  may  also  be  written  in  the  form 

Y{p)  -  1  +  clP  + 

C-^+...+CI^+pr  +  lg(p)m 

Obviously  g  (p)  will  be  a  rational  function  with 
the  same  denominator  as  Y(p)  and  a  numera- 
tor of  (*n-l)th  degree.  If  we  now  apply  a  sig- 
nal of  the  form 

E{t)  =  0 

=  r 

for  t  <  0 
for  i  >  0 

the  response,  by  (21),  will  be 

V(t)  «  F  +  rcT*  +  ^7=2),  cS-'+.-.+c, 

+  rl  L-1  [g(p)}  «>0). 

If  the  coefficients  in  the  rational  expression  for 
Y(p)  are  such  that 

ci  =  t/,  c2  =  //,•■•  cr  =  fj 



V(t)  =  (t  +  t,)>  +  r!  L-i  [g(p)}      (t  >  0).  (32) 

The  second  term  vanishes  exponentially  with 
time.  The  first  term  is  an  advanced  or  a  re- 
tarded facsimile  of  the  applied  signal  accord- 

ing to  whether  t,  is  positive  or  negative.  We 
shall  say  that  Y(p)  is  the  transmission  func- 
tion of  a  network  which  is  quasi-distortionless 
to  the  signal  tr. 

Obviously  a  transmission  network  which  is 
quasi-distortionless  to  the  signal  f  must  also  be 
quasi-distortionless  to  every  signal  f  where  s 
is  a  positive  integer  less  than  r,  including  zero. 
Hence  we  may  state  the  quasi-distortionless 
transmission  theorem. 

Quasi-Distortionless  Transmission 

If  the  signal 

E{t)  =  0  for  t  <  0 

=  polynomial  of  degree  r  at  most  in  /  for 
t  >  0 

is  applied  to  a  "quasi-distortionless  transmis- 
sion network  of  order  r,"  the  response  will  be 
of  the  form 

I'm  =  E{t  +  if)  +  {)(<■-<)      for  /  >  o, 

where  O(e  ')  stands  for  terms  which  vanish 
exponentially  with  time. 

If  t,  >  0  the  transmission  network  is  a  pre- 
dictor for  polynomials  of  degree  r  at  most. 
However,  it  does  not  begin  to  predict  properly 
until  some  time  has  elapsed  after  the  start  of 
the  signal,  or  of  a  new  analytic  segment  of  the 
signal;  that  is,  until  the  transients  have  sub- 
sided sufficiently. 

If  t{  —  0  the  transmission  network  may  be 
regarded  as  a  delay-corrected  smoother  for 
polynomials  of  degree  r  at  most.  This  is  ob- 
tained simply  by  taking 

ai  =  bi,  n2  =  b2,  ■■■  aT  =  bT 


in  (29), 

A. 11 


A  variable  linear  transmission  network  is 
one  in  which  the  response  V(t)  is  related  to  the 
impressed  signal  £(0  by  the  linear  differential 
equation  (1)  with  coefficients  which  are  pre- 
scribed functions  of  t.  The  solutions  of  such  a 
differential  equation  also  obey  the  superposi- 
tion principle.  Thus  it  is  possible  in  this  case 
also  to  formulate  the  response  of  the  network 
to  any  signal  in  terms  of  its  response  to  a 
standard  impulsive  signal. 

The  response  of  a  variable  network  to  an 
impulse  or  any  form  of  signal  depends,  how- 




ever,  on  the  time  at  which  the  signal  is  applied. 
For  an  impulsive  signal  applied  at  time  \  the 
response  at  time  t  will  be  represented  by 
W(t,x).  This  is  still  called  the  "impulsive  ad- 
mittance." In  the  theory  of  linear  differential 
equations  it  is  known  as  a  Green's  function. 
Physically,  it  must  be  identically  zero  for 

The  superposition  theorem  may  now  be  writ- 
ten in  the  form 

V(t)  =  jT+  E(\)  ■  W(t,\)  d\  (34) 

provided  the  network  has  been  properly  de- 
signed and  set  into  operation  at  t  —  0.  If 

W(t,\)  dX  =  1 

for  all  values  of  t  >  0,  the  response  may  be 
interpreted  as  a  weighted  average  of  the  sig- 
nal. We  note  that  in  order  to  interpret  the 
response  as  a  weighted  average  of  the  signal, 
it  is  now  no  longer  necessary  to  take  the  lower 
limit  in  (34)  as  —  oo,  as  it  was  in  the  case  of 
(2)  for  a  fixed  network.  In  other  words,  a 
variable  network  can  be  designed  and  set  into 
operation  at  any  time  so  that  components  of 
the  signal  which  arrive  before  that  time  are 
completely  ignored. 

The  analysis  and  design  of  variable  linear 
networks  are  in  general  much  more  difficult 

than  those  of  fixed  linear  networks.  This  is  due 
largely  to  the  fact  that  there  does  not  yet  exist 
a  technique  corresponding  to  the  steady-state 
and  operational  methods  used  in  connection 
with  fixed  networks.  However,  there  is  a  class 
of  variable  networks  whose  analysis  and  design 
are  greatly  facilitated  by  the  fact  that  they  are 
related  to  fixed  networks  by  a  transformation 
of  the  time  variable. 

Consider  the  linear  differential  equation 

.   d"V  dn~lV  ,  .  dV  ,  Tr  „ 

with  constant  coefficients.  With  appropriate 
restrictions  on  the  roots  of  the  characteristic 

6nXn  +  fc.-xX"-1  +  •••  +bi\  +  1 

it  represents  the  response-to-signal  relation- 
ship in  a  fixed  network,  if  z  is  proportional 
directly  to  time.  However,  if  z  is  a  more  gen- 
eral function  of  the  time,  it  will  correspond  to 
a  variable  network.  The  kind  of  transformation 
which  is  desired  here  is  one  which  transforms 
the  range  -  oo  <  z  <  +  tx  into  the  range 
0  <  t  <  +  oo  with  a  one-to-one  correspondence. 
Thus,  we  may  take  z  =  log  6(t)  where  6 (t)  is  a 
positive  monotonic  increasing  function  of  t  in 
the  range  0  <  t  <  +  oo,  with  <li£0  6(t)  =  0.  Sev- 
eral examples  of  6(t),  including  0(t)  =  t,  are 
considered  in  detail  in  Chapter  14. 





BEST  smoothing  or  weighting  functions  have 
been  determined  in  Chapters  10  and  11 
under  the  assumption  of  random  noise  with  fiat 
spectrum.  It  has  not  been  worth  while  in  prac- 
tice to  base  the  choice  of  best  weighting  func- 
tions on  any  more  elaborate  considerations  of 
actual  noise  spectra,  for  at  least  three  reasons : 

1.  The  effectiveness  of  a  smoothing  network 
shape  of  the  weighting  function. 

2.  Noise  spectra  are  subject  to  variations, 
due  to  factors  which  it  is  not  desirable  in  prac- 
tice to  attempt  to  control. 

3.  Elaborate  smoothing  functions  require 
elaborate  networks  with  close  tolerances  on  ele- 
ment values. 

Nevertheless,  the  theory  of  smoothing  pre- 
sented in  this  monograph  would  not  be  com- 
plete without  showing  how  more  general  shapes 
of  noise  spectra  can  be  considered.  Two  meth- 
ods are  presented  here,  which  are  generaliza- 
tions of  those  presented  in  Sections  10.3  and 
10.4,  respectively. 


Let  g(t)  be  the  tracking  error,  and  W  (t)  the 
impulsive  admittance  of  a  smoothing  and  pre- 
diction circuit  with  smoothing  time  T.  Then 
the  error  in  prediction  due  to  tracking  error 
only,  is 

m  =  fQTQ{t  -  r)  •  W(t)  dr. 

The  impulsive  admittance  W(r)  will  depend 
also  upon  the  time  of  flight  which,  for  purposes 
of  analysis,  is  assumed  to  be  constant.  The 
mean  square  error  is  then 

V2  =  -lim  kjlLY^di 

Jo  So 

W(Tl)  •  C(n  -  T|)  •  WWdtidtt 





g(\)  ■  g(\  +  x)  d\  •  (1) 

C(x)  is  the  autocorrelation  of  the  error  time- 
function  g  (A) . 

For  an  nth  order  smoothing  and  prediction 
circuit  V2  is  now  minimized  with  respect  to  the 
impulsive  admittance  under  the  restrictions* 


T"W(r)dT  =  C-</)"    (w  =  0.  1.  2  •••  n).  (2) 

Hence  W(r)  must  satisfy  the  integral  equa* 

jj  C(t  -  r)  •  W(r)dr  =  *0  +  *i<  +  •  ■  •  +  U" 

(0  <.  1  <.  T) 

where  the  km  are  constants  to  be  determined. 
Now,  if 

i     C(t  -  t)  •  W.m(r)dT  =  V"  (0  <•  t  <.  T) 

(to  =  0,  1,  2  -  n)  (3) 


W(t)  =  hWoir)  +  hWi(r)  +  •••  +  KWn(r).  (4) 

The  procedure  is  then  to  determine  C(x)  from 
(1),  the  Wm(r)  from  (3),  the  km  from  (2)  and 
(4),  and  finally  W(T)  from  (4).  It  may  be 
noted  that,  in  general,  every  km  will  be  a  poly- 
nominal  of  nth  degree  in  tf.  Hence  the  Wm(r) 
appearing  here  are  not  the  same  as  those  de- 
fined in  Chapter  11,  although  W(t)  should  be 
the  same  if  the  same  W0(t)  is  used  in  Chapter 

A  difficulty  of  the  theory  given  above  is  in 
the  solution  of  the  integral  equations  (3) .  This 
difficulty  is  avoided  in  the  theory  given  in  the 
next  section.  However,  the  integral  equations 
are  easily  solved  in  case  of  flat  random  noise, 
when  C(z)  is  simply  an  impulse  of  strength  K 
say,  at  x  =  0.  Then 


0  <  t  <  T. 

Since  the  strength  is  irrelevant,  it  may  be  taken 
equal  to  T  so  that  W0(T)  will  be  normalized. 

'These  follow  from  the  discussions  in  Sections  A.8 
«J  A.10,  especially  equations  (27),  (28),  (30),  and 





For  a  linear  prediction  circuit  it  is  then  found 

W(r)  =  2  (2  +  %)w0(r)  -  !  (  1  +  I  )  Wr(r). 

Putting  T  =  1  this  may  be  expressed  as 
W(t)  «  Wo(t)  +  G,(-  tf)voiM  (t) 

in  terms  of  the  G.(T)  and  Wmir)  of  Section 


The  theory  of  Phillips  and  Weiss  offers  the 
most  direct  proof  that  the  best  smoothing  or 
weighting  function  must  be  symmetrical,  re- 
gardless of  the  noise  power  spectrum.  The 
situation  is  that  of  minimizing  (1)  under  only 
one  of  the  restrictions  (2),  viz.,  the  normaliz- 
ing condition 

Jr  W(r)dr  -  1  (5) 

The  weighting  function  is  therefore  deter- 
mined, up  to  a  constant  scale  factor,  by  the 
condition  that 

jf  C  it  -  t)  •  W(r)dr  «  k,  (6) 

where  k  is  a  constant.  Substituting  T  —  t  for  t 
and  T  —  t  for  t,  we  have 

/C(t  -  0  •  W(T  -  r)dr  «  k.  (7) 

Since  C(  -  x)  =  C(x),  and  since  W(r)  is  de- 
termined uniquely  by  (6)  and  (5),  it  follows 
from  (6)  and  (7)  that 

W(T  -  t)  =  W(t).  (8) 


The  noise  power  transmitted  through  a  net- 
work may  be  expressed  in  the  familiar  form 

p  =  /    N(w»)  •  |r(tW)|»d« 

where  N(u>*)  is  the  noise  power  spectrum  and 
Yip)  is  the  transmission  function  of  the  net- 
work. Assuming  that  N(a>*)  is  a  rational  func- 
tion of  »*,  which  is  finite  at  all  finite  values  of 
w  including  zero,  it  is  possible  to  determine  a 

rational  function  S(p),  which  has  no  poles  on 
or  to  the  right  of  the  imaginary  axis  in  the 
p-plane  with  the  exception  of  the  point  at  infin- 
ity, and  such  that 

|S(tw)|2  =  AT(fc>2). 

It  may  be  readily  shown  that 

r-'£v<f>Y*  (0) 

where  F(t)  is  related  to  the  impulsive  admit- 
tance W(t)  by  the  operational  equation 

F(t)  =  S(p)  ■  Wit)  (10) 

The  problem  is  now  to  minimize  (9)  under  the 

^  /    Wit)di  =  1  when  <o  >  1.  (ll) 


Qip)  -  (P  +  «i)  (p  +  01)  •  •  •  (p  +  «-) 
Hip)  -  (P  +  A)  (p  +  A)  •••  (p  +  A) 

and  ft  is  of  no  consequence.  One  or  more  of  the 
a's,  but  none  of  the  pa  may  be  zero.  Since  the 
existence  of  the  integral  in  (9)  imposes  the 
requirement  that  Fit)  have  no  discontinuities 
of  higher  type  than  finite  jumps  in  the  range 
0  -  <  t  <  00,  the  continuity  conditions  on  W(t) 
in  (10)  must  depend  upon  the  difference  be- 
tween m  and  n  in  the  expressions  for  Q  (p)  and 

If  m  >  n,  it  is  fairly  obvious  that  Wit)  must 
be  differentiate,  in  the  ordinary  sense,  exactly 
m  —  n  times.  In  other  words,  Wit)  and  all  its 
derivatives  up  to  and  including  the  (m  —  n 
—  l)th  must  be  continuous,  but  the  (m  -  w)th 
derivative  may  have  finite  jumps.  If  m  <  n  we 
must  consider  the  introduction  into  Wit)  of 
discontinuities  of  higher  type  than  finite  jumps. 
These  discontinuities  arise  in  the  formal  ex- 
tension of  the  concept  of  differentiation  to 
functions  containing  finite  jumps. 

If  a  function  4 it)  has  a  finite  jump  of  am- 
plitude A0  at  t  =  a,  the  value  of  4,' it)  at  that 
point  will  be  indicated  formally  as  A0  •  S0(t  —  a) 
where  S0  it  —  a)  is  a  unit  impulse  at  t  =  a.  If 
*'(a  +  0)  -  *'(a  -  0)  =  A„  the  value  of  4," it) 
at  t  =  a  will  be  indicated  formally  as  A0 . 
it  -  a)  +  A,  •  8„«  -  a)  where  $,(«  -  a)  is  a 




unit  doublet  at  t  =  a.  And  so  on,  for  higher  de- 
rivatives of  $(<). 

The  expression  (9)  is  a  minimum  under  the 
restriction  (11)  if  Wit)  satisfies  the  differ- 
ential equation 

Qip)  -Q(-P)  W(t)  =  const.  (12) 

when  0  <  t  <  1  and  Y  (p)  the  condition 

1  /**" 

2^  /    S(P)  -S(-P)  •  y (p)e*dp  -  const, 
when  0  <  t  <  1.  (13) 
The  restriction    (11)'    itself  requires  that 
TP(t)  =0  when  t  >  1,  and 

TT(<)<&  =  1.  (14) 


Case  I.  (n  =  0) 

The  general  solution  of  (12)  contains  2m +  1 
constants  of  integration  which  are  determined 
by  (14)  and  the  2m  continuity  conditions  that 
Wit)  and  all  of  its  derivatives  up  to  and  in- 
cluding the  (m  -  l)th  must  vanish  at  t  =  0  and 
t  =  I. 

Case  II.  (n  #  0,  m  >  n) 

The  general  solution  of  (12)  contains  2m  +  1 
constants  of  integration  which  are  reduced 
to  2n  in  number  by  (14)  and  the  2(m  -  n) 
continuity  conditions  that  Wit)  and  all  of  its 
derivatives  up  to  and  including  the  (m  —  n  — 
l)th  must  vanish  at  t  =  0  and  at  t  =  1.  The 
remaining  2n  constants  are  determined  by  (IS) . 

The  left-hand  member  of  (13)  may  be  for- 
mulated by  the  method  of  residues.  The  ex- 
pression for  Yip)  should  first  be  separated 
into  two  parts  so  that 

Yip)  -  YL(P)  +  YK(p)e-> 

where  YL  (p)  and  YK(p)  are  rational  functions 
of  S(p)  S(-p)  .YL(p)e»  in  the  left-hand 
in  the  left-hand  half  of  the  p-plane  for  the  first 
part  of  Y  (p) ,  and  in  the  right-hand  half  for  the 
second  part.  Hence,  if  the  sum  of  the  residues 
of  S(p)  -  S(— p)  -  YL(p)e»  in  the  left-hand 
half  of  the  p-plane  be  donated  by  St.  and  if  the 
sum  of  the  residues  of  Sip)  •  S(—p)  •  YM(p)  ■ 
e»(t-i)  in  the  right-hand  half  of  the  p-plane  be 
denoted  by  XK>  then  the  condition  (13)  re- 
duces to 

2t  -       -  const.  (15) 

Case  III.  (n  ^  0,  m  <  n) 

The  2m  +  1  constants  of  integration  in  the 
general  solution  of  (12)  are  first  increased  to 
2n  +  1  by  appending  the  2  (n  -  m)  singularities 

kit),      «i(0,  1(0 

«o(<  -  1),  Slit  -  1),  ••■       — i  H  ~  1) 

and  then  reduced  to  2n  by  (14) .  The  remainder 
are  determined  by  (13)  or  (15). 
In  formulating 


it  may  be  noted  that 
£,[«„(<  -  a)]  = 

Example  of  Case  I 


(a  £  0)  . 

Let  S(p)  =  p".  The  differential  equation  (12) 
requires  Wit)  to  be  a  polynomial  of  degree  2m. 
The  conditions  at  t  =  0  require  it  to  have  a 
factor  tm,  and  those  at  t  =  1,  a  factor  (1  —  t)m. 
This  leaves  only  (14)  to  be  satisfied.  Hence 

Wit)  -  (2t^,1)!  [*(i  -  01-     (0  <;  t  Z  1) 

in  agreement  with  (8)  of  Section  10.8. 

Example  of  Case  II 


p  +  a 

P  +  0 


Then,  by 

W(t)  -  A0  +  Aie-«  +  A,f  (0  <  <  £  1) 

Y(p)  .  — 0  +  — — —  -l 


p  +  a       p  —  a 

_  pL-  +  dip  +  A-q  e-, 

|_p       p  +  a  p-aj 

2,  = 

Condition  (15)  is  satisfied  if 





where  Example  of  Case  III 

Q  «  °"  -  0i  r  .  Let  S(p)  =  1/1  +  fi.  Then,  by  (12)  and  the 

sinh  ^  +  0  cosh  rule  for  appending  singularities  in  Case  III 

Hence  W(t)  =  A0  +  AMO  +  At60(t  -  1)     (0  £  1). 



In  the  limit  as    o-»0,  S(p)  -  -  _  j^T  +  — ^ —  e~ 

and  2*  =  -  ^°  ~       eK'-D  . 

W(t)  «  =-±-2   (0  <:  <  £  1)  .       Condition  (15)  is  satisfied  if 

1  +   1     &i  A 

f   62  +  0  A\  m  At  — 


In  terms  of  expressions  (12),  Section  11.3. 


W(t)  =  Wt(t\  ±  k™l(t)        (0  il£l)  ,  +         +  6o(t  -  1) 

where  k  =  1/6  [£'/ (2  +  £)].  This  is  reminis-         w,q  m  f   (0  £  f  £  1) 

cent  of  Stibitz's  results  mentioned  in  Section  2 

10.3.  1  +  -J 





1.  The  Extrapolation,  Interpolation  and  Smoothing  of 
Stationary  Time  Series  with  Engineering  Applica- 
tion*, Norbert  Wiener,  OSRD  870,  Report  to  the 
Services  19,  Research  Project  DIC-6037,  The  Mas- 
sachusetts Institute  of  Technology,  Feb.  1,  1942. 

Div.  7-318.1-M2 

la.  Ibid.,  Chapter  1. 

2.  The  AnalytiM  and  Design  of  Servomechanisms, 
Herbert  Harris,  Jr.,  OSRD  454,  Progress  Report  to 
the  Services  23,  The  Massachusetts  Institute  of 
Technology.  Div.  7-321.1-M7 

8.  Behavior  and  Detign  of  Servomeehanitmt,  Gordon 
S.  Brown,  OSRD  89,  Progress  Report  2,  The  Mas- 
sachusetts Institute  of  Technology,  November  1940. 

Div.  7-821.1-M1 

4.  Antiaircraft  Director  T-15,  OEMsr-358,  Report  to 
the  Services  62,  Western  Electric  Company,  Inc., 
August  1948.  Div.  7-112.2-M6 

5.  The  Analytit  and  Synthetic  of  Linear  Servomecha- 
nicmc,  Albert  C.  Hall,  OSRD  2097,  Report  to  the 
Services  64,  The  Massachusetts  Institute  of  Tech- 
nology, May  1948.  Div.  7-821.1-MS 

6.  Antiaircraft  Director,  T-lS-El,  E.  L.  Norton, 
OEMsr-858,  Report  to  the  Services  98,  Bell  Tele- 
phone Laboratories,  Inc.,  July  80,  1945. 

Div.  7-112.2-M11 

7.  Theoretical  Calculation  on  Bett  Smoothing  of  Poti- 
tion  Data  for  Gunnery  Prediction,  R.  S.  Phillips 
and  P.  R.  Weiss,  OEMsr-262,  AMP  Note  11,  Re- 
port 532,  The  Massachusetts  Institute  of  Tech- 
nology, Radiation  Laboratory,  Feb.  16,  1944. 

Div.  14-244.4-M'l 

8.  A  Long  Range,  High- Angle  Electrical  Antiaircraft 
Director  [Final  Report  on  T-10],  C.  A.  Lovell, 
NDCrc-127,  Research  Project  2,  Division  7  Report 
to  the  Services  80,  Bell  Telephone  Laboratories, 
Inc.,  June  24,  1944.  Div.  7-112.2-M9 

9.  Flight  Records  of  Pitch,  Roll,  and  Yaw,  taken  in 
a  variety  of  bombers  at  Wright  Field,  Ohio,  Sperry 
Gyroscope  Company,  1942-5. 

10.  Detign  and  Performance  of  Data-Smoothing  Net- 
work, R.  B.  Blackman,  OEMsr-262,  Report  MM-44- 
110-38,  [Bell  Telephone  Laboratories,  Inc.],  July  8, 

11.  Computer  for  Controlling  Bombers  from  the 
Ground,  E.  Lakatos  and  H.  G.  Och,  OEMsr-262, 
July  24,  1944. 

12.  A  Position  and  Rate  Smoothing  Circuit  for  Ground- 
Controlled  Bombing  Computers,  R.  B.  Blackman, 
OEMsr-262,  Report  MM-44-110-79,  [Bell  Telephone 
Laboratories,  Inc.],  Aug.  21,  1944. 

13.  A  Two-Servo  Circuit  for  Smoothing  Present  Posi- 
tion Coordinates  and  Rate  in  Antiaircraft  Gun 
Directors,  R.  B.  Blackman,  Contract  W-30-069- 
ORD-1448,  Report  MM-44-110-65,  [Bell  Telephone 
Laboratories,  Inc.],  Sept.  27,  1944. 

14.  The  Theory  of  Electrical  Artificial  Lines  and  Fil- 
ters, A.  C.  Bartlett,  John  Wiley  and  Sons,  Inc., 
1931,  p.  28. 

15.  Network  Analysis  and  Feedback  Amplifier  Design, 
H.  W.  Bode,  D.  Van  Nostrand  Company,  1945. 

15a.  Ibid.,  Chapters  7,  8,  18,  and  14 

15b.  Ibid.,  p.  813. 

15c.  Ibid.,  p.  326. 

15d.  Ibid.,  p.  801. 

15e.  Ibid.,  p.  38. 

15f.  Ibid.,  p.  12. 

15g.  Ibid.,  p.  78. 

15h.  Ibid.,  p.  110. 

15i.  Ibid.,  p.  133. 

15 j.  Ibid.,  Chapter  6. 

16.  Fundamental  Theory  of  Servo-mechanisms,  L.  A. 
MacColl,  D.  Van  Nostrand  Company,  1945. 

17.  Automatic  Control  Engineering,  E.  S.  Smith,  Mc- 
Graw-Hill Book  Company,  Inc.,  1944. 

18.  Die  Lehre  von  den  Kettenbrucken,  B.  G.  Teubner, 
Leipzig,  1918. 

19.  "Transient  Oscillations  in  Wave  Filters,"  J.  R. 
Carson  and  O.  J.  Zobel,  Bell  System  Technical 
Journal,  July  1923. 

20.  "Harmonic  Analysis  of  Irregular  Motion,"  Nor- 
bert Wiener,  Journal  of  Mathematics  and  Physics, 
Vol.  5,  1926,  pp.  99-189. 

21.  "Generalized  Harmonic  Analysis,"  Norbert  Wie- 
ner, Acta  Mathematica,  Stockholm,  Vol.  55,  1930, 
pp.  117-258. 

22.  "Stochastic  Problems  in  Physics  and  Astronomy," 
S.  Chandrasekhar,  Review  of  Modern  Physics,  Vol. 
15,  1943,  pp.  1-89. 

28.  "Mathematical  Analysis  of  Random  Noise,"  S.  O. 
Rice,  Bell  System  Technical  Journal,  Vol.  23,  1944, 
pp.  282-832. 

23a.  Ibid.,  Vol.  24,  1945,  pp.  46-156. 

«S  1S07S 



Cover  Sheet  for  technical  memoranda 
Research  Department 

subject:  The  Transient  Behavior  of  a  Large  Number  of  Four- 
v-'  Terminal  Unilateral  Linear  Networks  Connected  in 

Tandem  -  Case  20876 


1  -  H.W.BW.B*F.-H.F#-Case  Files  mm-  46-110-49 

2  —  case  files  °ATE     April  10,  1946 

3-  L.G.Abraham-T.E. Brewer  authors  C.L*  Dolph 

4-  C.H.Elmendorf-H.K.Krist  idotbqkxoex    C.E.  Shannon 
s -  H.S.Black-F.B. Anderson            Index  No.  W1.416 

e-  G»N*Thayer-C.W.Harrison 
7  -  R.L.Dietzold 
a  -  L.A*MaoColl '  1 
9  -  B.M.01iver 

10-  C.L^Dolph 

11-  C.E.Shannon 


Asymptotic  expressions  for  the  transient 
response  of  a  long  chain  of  four-terminal  unilateral 
linear  networks  connected  in  tandem  subject  to  an 
initial  disturbance  are  developed  and  classified  accord- 
ing to  the  characteristics  of  the  common  transfer  ratio. 
It  is  shown  that  a  necessary  and  sufficient  condition 
for  the  stability  of  the  chain  for  all  n  is  that  the 
transfer  ratio  be  of  the  high  pass  type. 

The  mathematical  results  are  applied  to 
chains  of  self-regulating  telephone  repeaters. 

The  Transient  Behavior  of  a  Large  Number  of  Four-Terminal 
Unilateral  Linear  Networks  Connected  in  Tandem  -  Case  £0878 

MM-4 6- 110-49 
April  10,  1946 



The  transient  response  behavior  of  a  long  chain  of 
invariable  four-terminal  networks  connected .unilaterally  in 
tandem  is  of  primary  importance  in  the  design  of  cross-country 
wire  communication  systems,  since  the  successful  operation  of 
such  equipment  depends  upon  the  rapid  damping  of  transients 
caused  by  suddenly  applied  inputs. 

While  the  emchasis  in  the  memorandum  will  be  directed 
toward  coaxial  systems  cons'is-fcing  of  self-regulating  ^repeaters 
spaced  at  3-7  mile  intervals  and  spanning  distant  points,  the 
results  are  of  a  more  general  nature  and  would  apply,  with 
obvious  modifications  and  corresponding  interpretations,  to  any 
configuration  involving  a  large  number  of  four-terminal  linear 
invariable  networks  connected  unilaterally  in  tandem. 

It  will  be  shown  that  there  are  two  fundamentally 
different  types  of  transient,  response  possible  depending  upon 
the  gain  characteristic  of  the  transfer  ratio  of  the  individual 
four-terminal  linear  networks  comprising  the  system.    The  first 
type  of  response  while  satisfactory  is  difficult  to  achieve  in 
practice  because  of  the  stringent  requirements  on  the  gain 
characteristic  of  the  transfer  ratio.    The  second,  a  case  often 
encountered  in  practice,  will  be  shown  to  be  unsatisfactory  in 
general  since  it  leads  to  build-up  and  overloading  in  any 
physical  system  comprising  a  large  number  of  such  networks. 
However,  a  guiding  design  orinciple  will  be  suggested  which, 
it  is  believed,  will  enable  us  to  minimize  the  worst  of  the 
effects,  and  make  the  successful  operation  of  a  system  of  the 
type  envisaged  here  possible. 

This  memorandum  is  divided  into  two  parts.    In  the 
first  the  problem  is  defined  physically  and  then  formulated 
mathematically.    Following  this,  the  history  of  the  problem  is 
discussed  briefly  after  which  the  new  results  are  summarized.- 

Finally,  this  part  concludes  with  a  discussion  of  their  inter- 
pretation and  implications  for  the  coaxial  system.  The  second 
part  presents  the  detailed  mathematical  arguments  which  led  to 
the  new  results  of  part  one. 


Statement  of  the  Problem 

The  analysis  in  this  memorandum  is  directed  toward 
the  understanding  of  certain  anomalous  effects  which  a  long 
chain  of  self-regulating  telephone  repeaters  may  exhibit  at  its 
output  when  the  input  end  of  the  chain  is  subject  to  a  transient 
disturbance  (Cf.  Figure  1). 

The  gain  settings  of  the  repeaters  in  such  a  chain 
are  usually  controlled  by  the  level  of  a  pilot  frequency  some- 
where in  the  communication  band  and  the  regulation  is  designed 
to  compensate  for  low  frequency  phenomena  (up  to  approximately 
one  cycle  per  second)  such  as  the  diurnal  Change  in  line  resis- 
tance.   The  repeaters  in  the  chain  are  normally  absolutely 
stable  devices  so  that  any  transient  which  is  presented  to  the 
input  of  any  one  of  them  will  be  evanescent  in  time  at  the 
output  of  that  repeater. 

Since  transients  are  not  damped  out  instantaneously 
even  in  absolutely  stable  devices,  a  transient  disturbance  at 
the  input  to  the  first  repeater  in  such  a  chain  will  be  pro- 
pagated down  the  chain.     It  has  been  experimentally  observed 
that  under  certain  conditions  the' maximum  amplitude  of  a  tran- 
sient disturbance  may  increase  as  the  disturbance  is  propagated 
from  one  repeater  to  the  next  and  in  some  cases  there  may  be 
many  oscillations  of  sufficiently  large  amplitude  to  render  the 
system  inoperative  because  of  prolonged  over-loading. 

If  the  entire  chain  from  its  input  to  its  output  end 
is  considered  as  a  whole,  the  chain  does  behave  then  in  many 
respects  like  an  unstable  non-linear  device  in  spite  of  the 
fact  that  each  repeater  in  the  chain  is  absolutely  stable. 

Since  it  is  obvious  that  the  above  type  of  behavior 
is  at  best  undesirable  in  a  cross-country  link,  it  is  necessary 
that  its  cause  be  thoroughly  understood  and  that  all  .possible 
steps  be  taken  either  to  suppress  it  or,  if  this  is  not  possible, 
at  least  to  minimize  its  effects. 

Although  it  is  not  reasonable  to  expect  that  transient 
oscillations  can  be  kept  from  propagating  down  the  line,  or  that 
it  is  possible  to  isolate  the  line  from  all  transient  disturbances 
it  is  reasonable  to  seek  a  means  of  guaranteeing  that  the  tran- 
sients that  are  propagated  down  the  line  will  never  possess 
amplitudes  that  exceed  the  magnitude  of  the  original  disturbance 
or  to  seek  a  way  to  guarantee  that  the  maximum  response  of  the 
transient  oscillations  will  occur  so  shortly  after  the  initial 
disturbance  that  physical  apparatus  will  be  incapable  of  follow- 
ing or  distinguishing  it  from  the  unavoidable  initial  disturbance. 
A  way  of  guaranteeing  the  first  of  these  will  be  discussed  at 
length  and  a  suggestion  will  be  made  which  it  is  felt  will 
guarantee  the  second,  although  no  rigorous  proof  of  this  last 
fact  has  yet  been  given. 

Fig.  2  represents  a  schematic  drawing  of  a  typical 
satisfactory  type  of  transient  response  which  might  result  from 
a  unit  step  input  to  the  first  unit  of  Fig.  1.    Fig.  3,  on  the 
other  hand,  represents  a  schematic  drawing  of  a  typical  unsatis- 
factory type  of  transient  response  which  could  result  from  the 
same  input  to  a  system  of  the  type  of  Fig.  1  which  had  different 
characteristics.    Briefly  then,  the  problem  to  be  discussed  is 
that  of  determining  the  relationships  between  the  network 
characteristics  and  the  transient  response  for  networks  of  the 
form  of  Fig.  1. 

Mathematical  Formulation  of  the  Problem 

A  sudden  change  in  level  in  the  pilot  freauency 
before  the  n-th  repeater  results  in  the  modulation  of  this 
frequency,  changing  it  from  its  normal  form 

A  sin  <i>  t 



A  sin  u>  t  [1  +  f(t)  ] 

where  f(t)  represents  the  modulation  introduced  by  the  tran- 

After  passage  through  the  n-th  repeater,  this  last 
expression  is  transformed  into 

A  sin  (u>0t  +  <p)   [1  +  g(t)], 

-  4  - 

where  the  repeater  and  regulator  have  (possibly)  changed  the 
carrier  by  the  addition  of  the  phase  angle  q>  and  have  modified 
the  original  envelope  A[l  +  f(t)]  into  A[l  +  g(t)]. 

It  is  clear  that  from  the  standpoint  of  regulation 
it  is  sufficient  to  limit  discussion  to  the  transformation 
of  f (t)  into  g(t) .* 

The  exact  relationship  between  f(t)  and  git),  of  course, 
depends  upon  the  characteristics  of  the  repeater-regulator  cir- 
cuits which  are  in  general  non-linear.    However,  for  small  signal 
inputs  their  behavior  may  be  satisfactorily  represented  by  that 
obtained  from  a  linear  invariable  four- terminal  network.  Thus, 
the  chain  of  self-regulating  repeaters  may  be  replaced,  for  the 
purpose  of  mathematical  analysis,  by  a  chain  of  linear  invariable 
four-terminal  networks  having  a  common  transfer  ratio  y(p).  Thus, 
the  blocks  of  Fig.  1,  will  be  idealized  as  being  such  linear  four 
terminal  networks  throughout  the  analysis. 

Because  regulation  is  designed  to  compensate  for  low 
frequency  phenomena,  certain  characteristics  that  y(p)  should 
possess  are  known  a  priori:  namely; 

"    (1)     y(p)  must  represent  a  high-pass  system.    That  is,  . 
y(p)  — >  1  as  p  — >  oo 

(2)     y(0)  should  be  zero  if,  in  the  terminology  of  servo 
theory,  there  is  to  be  no  static  error. 


In  terms  of  y(p),  the  design  of  a  self-regulating 
system  reduces  to  two  problems: 

(I)     Given  y(p),  to  calculate  the  transient  behavior  of 
the  chain  of  self-regulating  repeaters, 

(II)     The  design  of  a  system  having  a  y(p)  which  leads 
to  satisfactory  transient  behavior. 

The  rest  of  the  memorandum  will  be  concerned  largely 
with  the  first  of  these.  The  calculations  will  be  carried  out 
in  general  terms  and  the  different  types  of  possible  responses 
will  be  described  in  terms  of  the  characteristics  of  y(p), 

*  Transit  time  between  repeaters  is  neglected  throughout  this 
memorandum.    More  exactly,  we  choose  a  different  origin  of  time 
at  each  repeater,  so  that  the  transit  time  does  not  appear  ex- 
plicitly in  the  formulae. 

-  5  - 

Mathematically  the  problem  discussed  in  this  memoran- 
dum can  be  formulated  as  follows:     If 'y(p)  represents  the  common 
steady-state  transfer  ratio  of  the  four-terminal  linear  units 
shown  connected  in  tandem  in  Figure  1,  the  output  voltage  response 
of  the  n-th  unit  V(t)  is  given  by  the  inverse  Laplace  integral: 

vn(t)  =  ^ 

-C  +  1CD 


y(p)n  epH0(p)  dp 

where  V  (p)  represents  the  spectrum  of  the  input  voltage, 

For  an  impulsive  input  of  intensity  YQ  applied  at 
time  t  =  0, 

=  V 

For  a  step  function  input  of  height  VQ  applied  at 
time  t  =  0, 

VQ(p)   =  VQ/p. 


Specifically,  this  memorandum  will  be  devoted  to  the 
study  of  the  behavior  of  Vn(t)  for  large  values  of  n. 

Four-terminal  networKS  are  normally  classed  as  low-, 
band-,  or  high-pass  depending  upon  the  character  ofly(iw)|. 
Typical  examples  of  I  y(  ico)  I   are  shown  in  Figure  4a,  in  which, 
following  the  usual  practice,  ly(iu)l   has  been  normalized  to  be 
unity  at  a)  =  0  in  the  low-pass  case;  at  o>  =  wo>  (the  mid-band 

frequency),  in  the  band-pass  case;  and  at  to  =  oo  in  the  high-pass 

From  the  viewpoint  of  the  asymptotic  behavior  of  the 
system  in  Figure  1,  it  is  convenient  to  modify  this  classifica- 
tion somewhat  when  speaking  of  the  over-all  gain  characteristic, 
|y(iu))|n,  of  the  transfer  ratio  of  a  system  comprised  of  n  units. 
For  sufficiently  large  n,  it  is  clear  that  |y(iu)|n  would  lead 
to  curves  of  the  type  shown  in  Figure  4b  corresponding  to  the 
low-pass,  band-pass  and  high-pass  curves  of  Figure  4a.  Thus, 
for  sufficiently  large  n,  the  gain  curves  B*,  C«,  and  D*  of 

-  6  - 

Figure  4b  are  seen  to  exhibit  the  type  of  behavior  normally 
associated  with  a  band-pass  characteristic.    A'*  and  E*y  °n  the 
other  hand,  exhibit  behavior  of  the  type  normally  classified  as 
low-pass  and  high-pass.    For  these  reasons,  the  terms  low-,  and 
high-pass  will  henceforth  be  reserved  for  those  gain  character-  , 
istics  which  are  always  less  than  their  values  at  u  =  0  and 
a)  =  oo  ,  respectively.    The  termj  band-pass,  will  be  used  to 
cover  all  other  cases;  namely,  those  in  which  ly(ia>)|  possesses 
one  or  more  maxima  at  finite  frequencies,  the  values  of  which 
exceed  the  values  of  ly(iu))|   at  both  zero  and  infinity. 

History  of 'the  Problem 

Several  people  have  considered  this  problem  in  the 
above  mathematical  form.    Before  proceeding  to  a  discussion  of 
the  results  of  the  general  theory,  it  will  be  instructive  to 
consider  a  few  illustrative  examples  of  their  results. 


(2)  = 

y(p)  =  p/(p+D 

The  gain  characteristic  is  clearly  of  the  high-pass 
type  and  satisfies  (1)  and  (2)  of  Page  6.    If  the  input  voltage 
is  a  unit  step,  then,  by  the  theorem  of  residues, 




i      '  — 'p=-i 

where  L-  ,(t)  denotes  the  Laguerre  polynomial  of  degree  (n-2). 
A  plot  of  Vn(t)  for  n  =  1,  2,   . . . ,  10  is  shown  in  Figure  5.  It 
is  known  that  for  large  n 

Lit)  =  J=  ?  (nt)-1/4  cos 

11  V  TT 

2(nt)1/2  -  g 

*This  examde  was  first  treated  by  L.  A.  HacColl  (MM-39-325<-166)  , 
9/11/39  and  W.  H.  Wise  ( UK- 38-343-22 ) ,  8/2/38.     The  above 
treatment  follows  that  of  LlacColl. 

where  =  is  to  be  interpreted  as  "asymptotically  equal  to." 


A  plot  of  the  approximate  "envelope" 


1    e  2  (nt)'1/4 

is  given  for  n  =  50,  100,  150,  200,  and  250  in  Figure  6. 

The  response  in  this  case  is  seen  to  be  both  ampli- 
tude and  frequency  modulated,  the  "instantaneous  frequency"  in 
the  sense  of  frequency  modulation  theory  being  given  by 

u '  m  ^  (2(nt)1/2)  «  A 

while  the  envelope  of  the  amplitude  modulation  is  approximately 
exponential.    In  particular,  the  type  of  behavior  found  here 
can  be  considered  satisfactory  since  there  is  no  tendency  for 
the  magnitude  of  the  largest  overshoot  to  increase  without  limit 
as  the  number  of  repeaters  is  increased.    As  will  be  shown 
later,  this  type  of  behavior  is  typical  of  any  network  having 
a  high-pass  characteristic  in  the  generalized  sense  of  that  term 
as  it  has  been  defined  above. 

In  MM-40-3500-92  dated  10/14/1940,  J.  G.  Kreer  and 
J.  H.  Bollman  concluded  that  the  appropriate  y(p)  for  a  self- 
regulating  repeater  employing  a  directly  heated  thermistor 
element  in  the  control  device  was  given  by 

It  should  be  observed  that  for  o 4=  0  this  transfer 
ratio  does  possess  static  error.    L.  A.  MacColl  in  MM-40-130-270 
treated  this  case  for  Id  <  1  and  found  that  the  system  exhibited 
essentially  the  same  type  of  satisfactory  behavior  as  that 
discussed  above. 

-  8  - 

(2)  A  slightly  more  complicated  example  is  given  by 

y(p)  =  P<P  +  °] 

(p  +  D2  *  ' 

It  is  easily  seen  that  for  a  <  vTT,  I  y(  iu>)  I   is  a  high-pass 
jharacteristic  in  that  I  y(  ico)  |  <  1  for  all  finite  to  and 
y(  io>)  I  — >  1  as  co  — >  oo  .    On  the  other  hand,  if  ft  >  -/IT, 
y(io))|   possesses  a  maximum  greater  than  1  at  some  finite 
frequency.     ly(ito)[   is  illustrated  by  curve  I  in  Figure  7  for 
a  =  1.4  (high-pass)  and  by  Figure  8  for  c  =  2  (band-pass). 
The  response  Vn(t)  to  a  unit  step  function  is  shown  in  Figures 

9  and  10  for  these  two  cases  with  n  =  1,2  9.    The  character 

of  the  response  is  seen  to  be  of  a  radically  different  kind 
for  these  two  values  of  a. 

For  a  =  1.4  the  response  is  seen  to  be  of  the  same 
type  as  that  encountered  in  the  first  example.    For  a  =  2,  on 
the  other  hand,  it  seems  to  represent  an  oscillation  in  which 
the  magnitude  of  the  largest  overshoot  is  increasing  without 
limit  as  n  tends  to  infinity.    Later  it  will  be  shown  that 
this  is  in  fact  the  case  and  that  satisfactory  operation  is 
impossible  for  a  large  number  of  repeaters  in  this  case. 

From  this  and  other  considerations  L.  A.  MacColl 
conjectured  that  a  necessary  and  sufficient  condition  that 
the  response  V  (t)  be  bounded  for  all  n  was  that    the  transfer 

ration  y(p)  have  no  net  gain  at  any  frequency.  Mathematically 
expressed,  a  necessary  and  sufficient  condition  that 

I Vn(t) I  <  M  for  all  n, 
where  M  is  independent  of  n  and  t,  is  that 

(M)  I  y(  ito)  I  <  1  for  all  real  frequencies  to. 

Physically,  the  condition  on  y(ito)  prevents  the  transfer  ratio 
]y(ito)|n  for  a  system  using  n  units  from  having  a  tremendous 
gain  at  any  particular  frequency. 

This  case  was  also  treated  by  L.  A.  MacColl,  but  no  memorandum 
on  it  was  ever  written. 

In  one  sense  this  memorandum  could  be  summarized  as 
a  proof  of  this  conjecture.    In  particular,  a  direct  proof  of 
the  necessity  of  MacColl's  condition  (M)   is  given  in  the  second 
part.    The  remainder  of  that  part  is  devoted  to  an  indirect 
proof  of  the  sufficiency.     The  argument  consists  in  exhibiting 
the  two  types  of  possible  responses;  the  first  being  that 
associated  with  a  y(p)  satisfying  MacColl's  condition  and  that 
second  that  resulting  from  a  y(p)  which  violates  it  at  one  or 
more  frequencies. 

Statement  of  Results 

The  detailed  results  of  the  sufficiency  argument 
are  discussed  conveniently  in  terms  of  the  generalized 
characterization  of  high-,  band-,  and  low  pass  y(p)'s  as 
given  on  page  8,    The  results  will  be  taken  up  in  that  order. 

High  Pass 

In  terms  of  the  above  classification,  the  class  of 
high  pass  y(p) 's  consists  of  just  those  functions  which  satisfy 
MacColl's  condition  and  are  therefore  those  from  which  a  satis- 
factory response  could  be  expected.    For  the  y(p)fs  in  this 
class,  it  is  clear  on  physical  grounds  that  the  maximum  contri- 
bution to  the  response  V  (t)  of  equation  (1)  will  come  from  the 

large  values  of  |w|   since  for  these  values  of  I  u|  ,  |y(  io))|n  >  1 

while  for  all  other  values  of  I  co|  ,  I  y(  iu>)  I  n — >  0.    Using  the 
first  three  terms  of  the  Laurent  expansion  of  y|  iu>|   about  u  =  oo  , 
one  finds: 

(5)*  y(iu))  =  1  +  S_i  +  \  , 

(6)  ly(iu)l  ~ 

,      a2  +  2b 

1  +  — s — 




(7)  Angle  y  (iuj  Sf.g  . 

*  It  is  assumed  that  a  >  0,  b  <  0,  and  that  2b  +  a    <,0.  These 
assumptions  correspond  to  a  second  order  maxima  at  I  u)l   ==  oo  and 
to  a  monotonic  decreasing  phase  function  for  y(p)  as  I  oo] — >  oo  . 

-  10  - 

If  these  approximations,  which  are  valid  for  I  to|  sufficiently 
large,  are  introduced  into  equation  (1),  it  can  be  shown  that 
the  principal  contribution  to  V  (t).for  a  unit  step  input  is 
given  by: 

Vn(t)  *  (n)-1^  (nat)-lA  exp  |  jfi!j±-^>tj  cos  (EvHSt 

This,  with  a  suitable  interpretation  of  the  constants 
a  and  b  is  seen  to  be  of  the  same  general  form  as  the  response 
obtained  by  liacColl  for  y(p)  =    p/(p  +  1)  as  given  by  equation  ( 
Just  as  in  that  example  the  response  is  both  frequency  and  ampli 
tude  modulated.     The  instantaneous  frequency  of  oscillation  is 
again  given  by 


The  gain  for 

y(p)   =  P(P  i 

(P  I  D2 

is  shown  on  curve  I  of  Figure  11.     Curve  II  of  this  figure 
represents  ly(iw)|100  for  this  y  (p').    For  this  example  and 
n  =  100,  the  true  gain  |y(iu)|100  ana  the  gain  approximation 
resulting  from  equation  (6)  are  indistinguishable  on  the  scale 
of  Figure  11. 

The  corresponding  phase  characteristic  for  y(p)100 
is  plotted  on  Figure  12  where,  for  reasons  which  will  appear 
in  Part  II,  the  actual  frequency  has  been  replaced  by 

w»  =    ^_  . 


Again,  on  the  scale  of  Figure  12  the  actual  phase  is  indis- 
tinguishable from  the  approximation  resulting  from  equation  (7). 
Figs.  7  and  13  present  the  same  information  for 

y(p)  =2l£_^il 

(p  +  ir 

and  n  =  100. 

-  11  - 

Again  the  agreement  between  the  actual  phase  and  the  approxi- 
mation is  excellent.    However,  there  is  a  considerable  error 
in  the  gain  approximation  for  small  I  <d|  ►    This  large  error  is 
unquestionably  due  to  the  fact  that  the  value  o  =  1.4  is  near 
the  critical  value  a  =  ST  at  which  the  characteristic  changes 
from  high-pass  to  band-pass. 

Agreement  with  the  above  asymptotic  formula  can  of 
course  be  obtained  by  increasing  n  sufficiently.  Alternately, 
for  n  =  100,  a  better  approximation  to  the  gain  can  be  obtained 
by  writing 

y(  iu)  =  1  + 

a  i 



~2  + 



ly(iu)l  = 

l  + 

2b  +  a 

2d  +  b    +  2ac 


'  I/2 

This  approximation  leads  to  a  curve  which  is  indistinguishable 
from  that  of  FyU^)!100  in  Figure  7.    With  this  approximation, 
one  finds  the  following  expression  for  VQ(t)  when  the  input 
is  a  unit  step  function 


V  (t)  *  (nj^Cnat)-1/4  cos  (2^nat  JL  )  exp((a^2bU) 

(         (2d  +  b2  +  2ac)t2  ) 

i1  +  2^  ■! 

(  ) 

This  expression  is  seen  to  approach  that  given  by  equation  (8) 

as  n   >  co  .    Thus  one  can  conclude  that  the  response  will 

always  be  satisfactory  if' y(p)  belongs  to  the  class  of  high-pass 
characteristics . 

Band-Pass  Case 

MacColl»s  condition  is  clearly  violated  whenever  ly(iu))| 
has  one  or  more  relative  maxima  greater  than  1  at  finite  fre- 
quencies.    For  simplicity  the  case  where  |y(iw)l   has  only  one  suet 

12  - 

maxima  at  u  =  to0  will  be  treated  first.  It  will  furthermore  be 
assumed  that  this  maximum  is  of  the  second  order;  i.e. 


^  0. 

Under  these  conditions,  it  is  physically  clear  that  the  maximum 
contribution  to  the  response  V  (t)  as  given  by  equation  (1)  will 
be  due  to  those  frequencies  near  o>o,  at  which  I  y(  iu>)  I  possesses 
its  maximum,  since  as  n  increases  ihis  region  becomes  increasing 
more  important  than  all  the  rest.    It  is  also  clear  that  the  time 
of  maximum  response  will  be  given  by  the  delay  time  experienced 
by  the  frequency  wQ  in  passing  thru  the  network.    This  is  known 

to  be  given  by.  tQ  =  -  n  B'(w0)  where  Bf(u0)  denotes  the  slope  of 

the  phase  characteristic  B(u>)  in  the  expression 


y(  iw)  =  A(uj)  exp  (  iB(u) )  . 

If  A(to)  and  B(u>)  are  expanded  in  a  Taylor's  series  about  u>  =  coq 

and  terms  up  to  the  second  order  retained,  it  can  be  shown  that 
the  response  to  a  unit  impulse  function  is  given  by 

(ii)  vn(t)  =  A(^Jn 


G(u0)  exp  ( 


o/  )  cos  |u>Qt  +  nB(uQ) 


0(»0)   -  n-V8j 

(  — 



*  CB»»(w0)n 


A'  '(cup) 

(I  A"l«Q) 


>  0 

-  13  - 

(B"(w0)  A{« J) 
io((,o)  =  arctanj     2a,,([Uq)  ) 


tQ  =  -nB(wQ) . 

Thus  V  (t)  can  be  interpreted  as  an  amplitude  modulated 

wave  with  an  envelope  proportional  to  the  Gauss  error  curve 

(-(t-tj2  ) 
e*Pj       2n  H^o)j 

with  a  standard  deviation  given  by 


(  n 




-  )l/2 


J  ) 

The  standard  deviation  cr  is  of  course  a  convenient  measure  of  the 
duration  of  the  disturbance.    The  maximum  response  occurs  for  time 
t    =  -  n  B'  («  )  at  which  time  the  amplitude  is  proportional  to 


.  ✓IE 

Thus  if  A(w  )  >1,  the  maximum  response  will  represent  a  value 
which  is  very  large  compared  with  unity,  the  magnitude  of  the 
original  disturbance,  if  n  is  large.    This  would  force  any  system 
involving  vacuum  tubes  to  overload  if  n  were  sufficiently  large. 

These  properties  are  summarized  in  Figures  (14)  and 
(15).    Figure  (14)   is  a  plot  of  the  response  for  values  of  t 
near  t     for  a  few  values  of  n  for  the  example  given  by  equation 

(4)  where  a  =  2.    Figure  (15)   is  a  plot  of  the  maximum  response 
for  a  few  values  of  n  for  different  values  of  the  parameter  a. 

It  should  be  remarked  that  the  above  approximation  to 
the  gain  which  was  obtained  by  keeping  only  the  first  two  terms 

-  14  - 

of  the  expansion  of  A(w)  about  go  =  u)Q  could  only  be  expected  to 

be  a  reasonable  one  for  fairly  large  values  of  n,  since  it 
represents  a  usually  unsymmetric  gain  characteristic  by  a 
symmetric  function.    A  better  or  second  approximation  can  be 
obtained  by  using  three  terms  of  the  Taylor's  expansion  instead 
of  two.    Just  as  in  the  high  pass  case,  the  retention  of  this 
extra  term  gives  rise  to  a  second  term  in  the  expression  for 
Vn(t)  but  it  does  not  fundamentally  alter  the  characteristics 

of  the  response  since  the  correction  term  vanishes  for  t  =  t  , 

at  which  time  the  response  is  still  a  maximum,  with  the  same 
amplitude  as  before.    Its  only  effect  is  to  take  cognizance  of 
the  unsymmetrical  character  of  the  gain  characteristic  A(w)  and 
to  change  the  resulting  response  envelope  to  an  unsymmetrical 
one.    Of  course,  it  also  modifies  the  phase  of  the  oscillation 
inside  the  envelope  in  a  complicated  way  without  changing  the 
fundamental  frequency  of  oscillation.  • 


For  these  reasons  and  because  of  the  complexity  of  the 
resulting  expression,  it  will  not  be  written  down  here  explicitly 
although  the  explicit  approximation  to  the  gain  A(w)  will  be 
discussed  in  Part  II. 

The  two  approximations  to  the  gain  are  illustrated  for 
equation  (4)  with  a  =  2  in  Figure  16  for  n  =  100,    In  this  case 

.    .       |u)|-/)2  +  4 
A(u)    =   5   • 

(iT   +  1 

As  can  be  seen  from  the  figure,  the  second  approximation  does  in 
fact  represent  A(w)  over  the  significant  range  of  frequencies 
near  -w    from  which  it  can  be  concluded  that  the  response  will  be 

unsatisfactory.    Figure  (14) r  previously  referred  to,  furnishes 
a  picture  of  the  envelope  response  as  obtained  from  the  first 

In  the  event  that  A(^)  takes  on  its  maximum  value  at 
more  than  one  place  in  the  finite  frequency  range,  it  is  clear 
that  the  above  results  can  be  generalized  as  follows: 

Let  V  . (t)  be  the  response  of  the  form  given  by  equation 
(11)  due  to  a  maximum  at  co  =  w-  ,    Let  the  time  of  maximuma  response 

-  15 

from  this  maximum  be  denoted  by  t.  =  -nB*(wj_)»    Then  the  total 
response  is  clearly  given  by  the  expression 


vn(t)  =  Z  V  .(t)., 
n         i=1  ni 

if  there  are  k  relative  maxima*    Unless  the  values  of  A(w)  at 
the  points  u)  =        are  nearly  the  same,  it  is  also  clear  that 

only  those  terms  of  the  above  sum  which  correspond  to  the  largest 
maxima  of  A(w)  will  be  of  significance.  . 

The  band-pass  case  is  also  discussed  briefly  for  unit 
step  inputs  in  Part  II. 

Low  Pass  Case 

Since  the  low-pass  case  differs  from  the  band  pass  case 
only  in  that  A(w)  has  its  maximum  for  w  =  0  instead  of  at  u  =  uQ 

^  0  the  results  of  the  two  are  very  similar.    The  results  in 
the  low-pass  case  are  simpler  because  it  will  be  recalled  that 
B(w)  (as  defined  by  equation  10)  is  an  odd  function  of  10  for  any 
physical  network,    This  forces  both  B(0)  and  B'^(0)  to  be  zero  so 
that  for  an  impulsive  input  one  obtains  the  simple  formula; 

(12)     j  It)  Vim  In"3/2 
n  -/2n  ( 


-1/2)  (t-tQ)2  A(0)) 

J  exp  [   2n  A'*  (0)j 

This  result  corresponds  to  the  well-known  formula  from 
transmission  line  theory  for  non-distortionless  lines. 


From  the  practical  viewpoint  the  above  results  have  the 
following  implications  for  communications  systems  such  as  a 
cross-country  coaxial  telephone  system  employing  self-regulation 
repeaters  spaced  at  intervals  of  a  few  miles. 

(1)     If  the  transfer  characteristic  of  each  individual 
network  is  of  the  high-pass  type  (in  the  sense  in  which  this  term 
has  been  used  above)  then  the  transient  response  will  never  exceed 
the  initial  value  of  the  disturbing  input  voltage  and  it  will 
be  damped  out  so  that  the  operation  of  the  communication  system 
would  generally  be  considered  satisfactory. 

-  16 

(2)    If  the  network  is  not  of  the  high-pass  type,  the 
usual  practical  case,  and  there  is  any  net  gain  in  the  system, 
which  is  peaked  at  u>0  then  for  even  a  small  number  of  units  the 

response  will  exceed  the  initial  input  at  the  time  given  by 

tQ  =  -  nB'(u>0) 


A'(u)0)  =  0 

and  if  the  number  of  units  is  sufficiently  large  the  output 
from  the  n-th  unit  will  be  large  enough  to  cause  severe  over- 

At  first  glance  these  implications  are  not  promising 
and  seem  to  indicate  that  the  operation  of  a  cross-country 
system  involving  several  hundred  repeaters  and  regulators  would 
be  extremely  difficult,  since  , the  only  satisfactory  characteristic 
is  difficult  to  attain  in  practice.    However, "practically  the 
ideal  characteristic  which  is  high  pass  can  be  approached  in  the 
sense  that  the  peaked  frequency  can  be  made  very  large.  Thus 
the  maximum  response  may  occur  so  soon  after  the  initial  distur- 
bance that  the  physical  system  would  not  be  able  to  follow  it  or 
to  distinguish  it  from  the  initial  disturbance  which  in  many 
cases  would  be  large  enough  to  cause  momentary  overloading  of  the 

Moreover,  it  is  ah  experimental  fact  that  in  the  design 
of  feedback  regulator  characteristic  forcing  the  peaked  frequency 
higher  reduces  the  size  of  the-  peak  which  in  turn  will  permit  the 
use  of  a  larger  number  of  regulators  in  the  system. 

If  this  is  done,  the  time  of  maximum  response,  tQ  = 

nB'(^0),  will  be  small  since  B'(a))  in  general  is  small  for  large 

u).    Assuming  that  the  effects  of  the  maximum  response  have  been 
treated  in  this  way,  it  is  natural  to  inquire  into  the  type  of 
response  which  will  result  for  finite  values  of  t  >  tQ. 

If  one  examines  the  gain  characteristic  curve  of  the 
type  shown  in  Figure  (7),  it  is  clear  that  for  frequencies  less 
than  some  frequency  u>,  slightly  less  than  the  peak  frequency  u>0, 

-  17  - 

the  shape  is  fundamentally  like  that  of  the  high-pass  case. 
Remembering  that  the  phase  delay  of  a  frequency  through  a  linear 
network  is  given  by  the  slope  of  phase  characteristic  at  that 
frequency,   it  is  clear  that  the  response  for  values  of  t  greater 
than  tQ,  the  time  of  maximum  response,  will  come  from  the  fre- 
quencies less  than  uQ,  since  the  phase  slope  characteristic  is 

large  for  small  frequencies  and  small  for  large  frequencies. 
Now  if  it  is  assumed  that  the  phase  characteristic  nB(u>)  is  a 
monotonic  decreasing  function  of  to,  it  is  clear  that  the  'function 
(nB(w)  +  tot)  will  always  be  stationary  at  an  arbitrary  frequency 
u>,  provided  that  t  is  given  a  suitable  corresponding  value.  Thus, 
it  is  reasonable  to  expect  that  the  response  for  t  »  tQ*  will 

exhibit  the  same  type  of  character  as  that  obtained  in  the  high- 
pass  case  discussed  above.     This,  it  will  be  recalled,  is  both 
frequency  and  amplitude  modulated  with  an  envelope  which  decreases 
approximately  exponentially.    Thus,  under  these  circumstances  it 
seems  reasonable  to  supoose  thet  satisfactory  operation  of  the 
communication  link  could  be  obtained. 

To  recapitulate,  the  most  practical  design  for  any 
system  of  the  type  envisaged  in  Figure  1,  from  the  viewpoint  of 
satisfactory  transient  response  involves  approaching  the  high- 
pass  characteristic  as  closely  as  possible  by  making  the  gain 
characteristic  of  the  transfer  ratio  peak  at  as  high  a  frequency 
as  is  practicable  and  by  keeping  the  phase  slope  characteristic 
monotonic  for  all  smaller  frequencies. 


Mathematical  Discussion 

Theorem  I.    A  necessary  condition  that  the  response  Vn(t)  from  a 

chain  of  n-four  terminal  linear  invariable  networks  sub.ject  to~a" 
unit  step  input  function  have  a  common  finite  bound  for  all  n  is 
that  the  transfer  ratio  y(p)   satisfy  the  relation- 

(M)  |y(iu))|<  1  for  all  real  values  of  w. 

*  A  different  type  of  expansion,  valid  for  any  fixed  t  or  n  — >  co 
is  discussed  at  the  end  of  Part  II. 


-  18  - 

Proof:    By  hypothesis 

Iv  (t)|<  M    for  all  n  where  M  is  independent  of  n  and  t 
n  ■ 


so  that 

Vn(p)  =  J    e"pt  Vn(t)  dt 

n  VP) 
y(p)n  .  ,  pVn(p) 

ly(p)ln  -  ipl|f°  e~pt  vn(t)  dt| 

lvn(t)l  dt 

<  I  pi  M  J  I 

If  p  =  c  +  iw  and  if  c  >  0,  then 

'  2  'c 

C      +  Od 


so  that 

log  (y^kllog^V/ 

Thus,  in  the  limit  as  n  —  od  ,  it  follows  that  for  any 
p  with  a  positive  real  part 

log  I  y(p)  !<  0 

-  19  - 

and  hence 

ty(p}]<  i 

Since  this  relation  holds  everywhere  in  the  right-hand  half 
plane,  it  follows  from  simple  continuity  considerations  that 
the  maximum  of  ly(iw)|,  never  exceeds  1,  Thus 

ly(iw)l  <  l 

as  was  to  be  shown. 

The  remaining  discussion  will  be  devoted  to  the 
characterization  of  the  different  types  of  possible  responses 
and  will,  at  the  same  time,  furnish  an  indirect  proof  of  the 
fact  that  the  condition  (M)  on  y(p)   is  also  sufficient. 

High  Pass  Case  -  Unit  Step  Input 

If  the  networks  comprising  the  system  shown  in 
Figure  1  possess  a  transfer  ratio  having  a  high  pass ^ gain  char- 
acteristic in  the  sense  defined  above,  and  if  one  writes  , 

y(iu>)  =  A(u)  eiB(u)) 

then  the  gain  function  A(«)  satisfies  the  two  conditions 

(A)  A(w)  <  1  for  all  finite  frequencies  u». 

(B)  Lim    A(w)   =  1 

to    •-*  00 

Under  these  conditions  it  is  clear  that,  for  sufficiently  large  n, 
the  main  contributions  to  Vn(t)  will  be  due  to  the  high  values  of 

I  u)|  .    For  convenience, .  Vn(t)   is  written  here  in  slightly  dif- 
ferent form 

Vn(t,  -He  \l  fA(.,»eW«'-'  -^ 
("J0  ) 

-  20  - 

For  large  values  of  I  w|  ,  all  physical  transfer  ratios  y(ito) 
of  interest  to  us  here  can  be  represented  by  an  expansion 
of  the  form* 

M„v  ,  . ,  .       ( ,       ai         b       ci       d  ) 


We.  shall  confine  our  attention  to  the  ordinary  case,  in  which 
a  >  0,  b  <  0  and  2b  +  a2  <  0.     For  large  values  of  f col  ,  we  now 


(14)  A(u)   =  S[l  +  \  +  4  +  ...T2  +  C§  +  -%  +  ---l2! 

V  GO  U)  to  ' 

a  c 

(15)  B(u))  =  arctan  u) 

—  +  —75-  +  •  •  • 

,        b  d 
1  +  ~2  +  ~4  + 

It  is  clear  that,  for  I  oo|   sufficiently  large,  the 
leading  terms  of  these  expressions  will  furnish  adequate  approxi- 
mations to  A(u)  and  B(w).    These  are: 

2      9.  1/2 
(16)  A(w)   =  [1  +  a    +z  2b] 

(IV)  B(u))  =  §  . 

Let  uQ  be  the  frequency  defined  by  the  condition  that 

these  approximation  are  accurate  to  within  the  arbitrarily  chosen 
permissible  error  e  for  values  of  go  such  that  w>wq.    Then  we 
can  write 

*  In  the  usual  case  y(p)  is  a  rational  function,  so  that  this 
expansion  can  be  readily  obtained. 

-  21  - 

(  „co  .  r  _  ,   ,     .  n  n  doj 

Vn(t)  =  ±  Re  Jo°  A(co)n  eirnB(u))+ut^]  - 

O)  CO 


=-±Re  (Ix  +  I2). 

It  is  clear  that 

II  I  <  fo  iam£  dw- 

1    ~J0        I  col  ■ 

Since  fA(w)  Jn  —  0  for  each  co  in  the  finite  range  0  <  to  <  u  , 

it  is  clear  that  1 I -J    can  be  made  negligibly  small  by  taking 

n  sufficiently  large.  Introducing  the  new  variable  v  defined 
by  the  relation 

v  =  CO 




I2  can  be  written  as 


1  + 

(a    +  2b )t 




(a     +  2b)t 

-  22  - 

and  using  the  binominal  expansion,  one  has 

Ca*  +  2b) t 



n/2  — 

1  + 


1  +  f  + 

|  (§  -  1) 


1  +  J  +  1/2  (1  -  ^)  (X)  + 
e^2  +  terms  in  l/n. 

Thus,  for  sufficiently  large  n,  I2  becomes,  approximately 



(a    +  2b)t 



Vnat  (-  +  v)  dv 

In  this  form  the  principle  of  stationary  phase  can  be  applied  to 
I2  (Cf.  Appendix  I);  for  the  amplitude  factor 

(a2  +  2b)t 



is  independent  of  n,.  while  the  phase  function  (in  the  notation 
of  the  appendix) 

¥(v)   «       +  v) 

is  monotonic  in  the  range  of  integration  on  each  side  of  the 
stationary  point  (v  =  1)  where 

tp'(v)  =  0 

-  23  - 

Physically  speaking  the  form  of  equation  (18)  suggest 
the  interpretation  of  Vn(t)  as  the  sum  of  an  infinite  number  of 

complex  waves  whose  amplitudes  are  slowly  varying  function  of  v 
and  whose  complex  phases  are  rapidly  varying  functions  of  v. 
Under  this  interpretation  it  is  physically  reasonable  to  expeot 
that  wave  interference  will  occur  everywhere  except  near  v  =  1 
where  the  phase  function  given  by  equation  (19)  is  stationary. 
This  is  the  principal  of  stationary  phase.    It  remains  to 
evaluate  the  principal  contribution  to  Ig  for  values  of  v  near  1. 

Replacing  y  (v)  by  the  first  three  terms  of  its  Taylor*s  series 
about  v  =  1, 

q>(v)   =  cp(l)  +  0  +  -  1)     =  2  ♦  (v  -l)2 

the  main  contribution  to  Ig  is  given  by 


1    *  eir2vnat  -  |] 


e      2av2         iVnat  (v  -  l)2  dv, 


In  the  interval  (1  -  r\f  1  +  r\)  t  the  amplitude  factor 

i  exp  T(a2  +  2b)t/2av2] 

is  substantially  constant  and  may  be  removed  from  under  the 
integral  sign  and  evaluated  at  v  =  1.    By  the  reasoning  of 
Appendix  I,  the  contributions  to  the  remaining  integral  are 
not  appreciably  affected  if  the  limits  are  changed  to  (-co,  oo  ) 
respectively.  Letting 

I  *  v  -  1 
we  can  then  write  10  in  the  form 

I    ~  exp  j(a2  exp  fi  2v€St  -  1  §3 f°°  eiVMt  «  d£ 

(  )  -CD 

-  24  - 

By  the  known  properties  of  Fresnel  integrals 


and  hence 

Taking  the  real  part  and  dividing  by  n,  the  asymptotic  expression 
for  Vn(t)  is  therefore  given  by: 

(20)      Vn(t)  =  n'V2  (nat)-1^  exp  ( ( ag  +2b)t)  cos  {Z/m  _  n, 

which  is  equation  (8)  of  Part  I. 

A  more  accurate  approximation  to  the  gain  A(w)n  is 

given  by 

if,.i       n       2b  ♦  a2      2d  +  bf_j_2ac-.l/2 
A(w)  =  [1  +   *         +  t  J 

where  the  first  three  terms  of  equation  (13)  have  been  retained. 
From  this  it  follows  that: 

m.a*  ~  n  (/2b  +  a2  2d  +  b2  +  2ac? 
A(w)     =  exp  -J-  (  §         +   t  J 

exp  [n  (2b  .  a2)]  exp  j|  (2d+b2+2ac)| 
(*       ^      )  (2  ^  ) 

from  which  it  follows  that  the  second  approximation  is  obtained  by 
multiplying  the  first  by  the  factor 

exp  (p 


jn  (2d  +  b2  +  2ac) 

If  the  frequency  transformation  v  = 


is  now  made 

the  first  factor  will  as  before  be  independent  of  n.    Over  the 
range  of  integration  where  the  integral  is  significant their 
product  can  be  removed  from  under  the  integral  sign  giving 

V  (t)  =  (n)"1/2  (nat)*"1/4  cos  (2Vnat  - 


(a2  *  2b)t 
  2a  _ 


(2d  +  b2  +  2ac)t2 

2a2  n 

%  (u)"1/Z  (nat)"1/4  cos  (2vnat  -  $) 


(a    +  2b )t 

,       (2d  +  b2  +  2ac)t2 

1  +  J  5  1        *  ••• 

2,eT  n  _J 

which  is  the  equation  (9)  of  Part  I. 
Band  Pass  Case  -  Impulsive-  Input 

For  simplicity  let  it  be  assumed  that  the  gain  charac- 
teristic A(u)  has  only  one  absolute  maximum  at  u>  =  wQ  on  the 
positive  frequency  range  and  that  this  is  a  second  order  maximum. 

-  26  - 

The  response  Vn(t)  can  always  be  written  in  the  form 

(co  ) 

A(wo,n        f     n  log  H^-r      inB(u)  +  iut  ) 
Vn(t)  =  — Re  Jo  en  l0*  TU^f  ♦  dw). 

In  this  form,  Vn(t)  can  again  be  interpreted  as  being  proportional 
to  the  sum  of  an  infinite  number  of  complex  waves  of  amplitude 

with  varying  complex  phase*  given  by 

cp(w,t,n)  «=  nB(o))  +  wt. 

With  this  interpretation  it  is  clear  that  the  maximum  contri- 
bution to  Vn(t)^will  be  given  by  those  frequencies- in  the 

neighborhood  of  u>  ,  where  uQ  satisfies  Ar(w)  =  0  and  at  values 
of  the  time  t  near  t    at  which  the  phase  function,  <p(u>,t,n) 
is  stationary  for  the  maximum  frequency  i»Q.    Thus  tQ  is  given 

t0  =  .nBM«0). 


A(w0)  ^  0  and  A«(wpj  =  0 

♦"Phase"  as  used  here  differs  from  the  way  it  is  normally  used 
in  engineering. 

27  - 

one  can  write  for  a  suitable  small  neighbothood  of  wQ 

If  we  retain  only  the  first  term  of  this  expansion,  then  for  a 
suitably  restricted  neighborhood  of  wQt  one  has 



n  log  A(uQ 


nA"(u>o)   (u  _  u  ,.: 

Similarly,  for  u  sufficiently  near  o)Q 

Bw(co0)  2 
(23)       B(o>)  =  B(coQ)  +  B»(w0)("  ~  «0)'*  — g         <w  "  V  * 

Henceforth  for  simplicity,  we  shall  write 

A  =  A(co0),  A"  =  A"(wo),  B  =  B(w0),  B»  =  B»(«0), 
B"  =  Bw(cjq) 

If  these  approximations  are  valid  in  the  neighborhood, 

(uQ  -  A,  wQ  +  A  it  follows  that 



iRe  ( 


A(u>)n  e^nB(w)  +  Wt:d(, 


♦  A 





(W  -  a)  )2  +  i[nB  +  nB»  (w  -  (DQ) 

-  28 

Since  [A(u>)]n  —  0  as  n  —  oo  ,  except  near  u  =  wq,  it  follows  as 

before  that  the  sum  of  the  bracketed  integrals  can  be  made 
negligibly  small  in  comparison  with  the  remaining  one  if  n  is 
taken  sufficiently  large.    Recalling  that 

t    =  -nB'CO 
o  o 

the  remaining  integral  can  be  written  as 

Tn(t)  =  |  Re  Un  e1^  ♦  -tl 

,u)o+A     r  „ 

exp  M11  (w  "^o1    +  i(t  -to)(a)  -°)o) 




Again  the  finite  limits  of  integration  can  be  replaced  by  -  go 
and  oo  since »  for  large  n, 

I*- (--.-„)' 


will  be  small  except  in  the  immediate  neighborhood  of  u  . 
If  one  sets 

p  .  -n  (£  *  oB")  . 

p2  =  i2(w  -  wo)  ;  g  -  t  tQ 

then  the  remaining  integral  can  be  recognized  as  pair  No.  710.0 
of  the  Campbell  and  Foster  Tables. 

Then  one  finds 

Vn(t)  =  —372"    Re     {{  An    expCinB+io)0t3    exp  [-(t-tQ)2] 

2n°/&  ( 

(  VP 


The  result  is  equivalent  to  that  given  by  equation  (11) 
of  part  I.    If  A(cjQ)  is  greater  than  1,  it  is  thus  seen  that  the 

response  will  have  a  maximum  value  that  builds  up  very  rapidly 
as  n  increases  and  would  eventually  force  any  system  involving 
vacuum  tubes  to  overload. 

It  should  be  remarked  that  the  above  approximation 
to  the  gain  could  only  be  expected  to  be  a  reasonable  one  for 
fairly  large  values  of  n,  since  it  represents  a  usually  un- 
symmetric  gain  characteristic  by  a  symmetric  function.    A  better 
or  second  approximation  can  be  obtained  by  keeping  the  second 
term  of  the  expansion  of  the  logarithm  in  (21),  and  then  tak- 
ing the  first  term  of  the  expansion  of 

(U)  -  0)  )'  . 

This  yields 

The  addition  of  the  second  term  in  the  above  ex- 
pression gives  rise  to  an  additional  term  in  Vn(t),  provided 
that  the  same  phase  approximation  (23)   is  retained.  The 
resulting  V  (t)  is  similar  to  (11)  but  the  new  envelope  con- 
sists of  the  old  envelope  plus  nA"/6A  times  the  third  deriva- 
tive of  the  old  envelope.    The  modulated  frequency  remains 
the  same  but  the  phase  is  changed  in  a  complicated  manner. 
(Compare- pair  710.3  of  the  Campbell  and  Foster  tables). 

Unit  Step  Input 

In  this  case  one  can  write 

Vn(t)  =  -  Re 


i[nB/u)  +  g] 


As  before  the  only  significant  frequencies  are  in  the  neighbor- 
hood of  a)  =  to    and  near  this  point  the  1_  in  the  denominator 

can  be  taken  out  of  the  integral  as  l/w"  provided  u>Q  i  0.  Thus 

the  result  will  be  same  as  for  the  impulsive  input  apart  from 
the  factor  l/wQ  if  one  makes  nB(u>)  -  n/2  correspond  to  nB(u>) 

in  (11). 
Low-Pass  Case 

It  is  clear  that  the  analysis  for  this  case  in  which 
the  equation  A'(")  =  0  is  satisfied  for  w  =  0  can  be  carried 
through  in  exactly  the  same  manner  as  the  band-pass  case  treated 
previously.    The  resulting  answer  is  capable  of  simplification, 
however,  if  it  is  recalled  that  B(w)  for  any  physical  network 
is  an  odd  function  of         This  forces  both  B(0)  and  B,f(0)  to 
be  zero.    The  resulting  formulae  then  become 

a)     Impulsive  Input 

b)  Unit  Step  Input 

A(0)n  e  W  A(0) 
  2n  A"(Cfr 






3/2  /2nA'  Ha) 
n      J  A(Gj 


exp  j     2nA"(»)  jdt' 

31  - 

This  last  expression  involves  an  integral  since  it 
is  necessary  to  eliminate  the  pole  at  zero  where  A(w)  has  its 
maximum.    This  can  be  done  by  differentiating  Vn(t)  with  res- 
pect to  t,  finding  the  aysmptotic  formula  for  V^(t)  as  before 

and  then  integrating  to  obtain  (24) • 
Hamy*s  Expansions  in  the  Band-Pass  Case 

The  type  of  asymptotic  expansions  so  far  given  for 
the  band-pass  case  were  explicitly  designed  to  represent  Vn(t) 

in  the  neighborhood  of  t  =  t    where  Vn(t)  is  a  maximum.  They 

could  in  no  sense  be  considered  the  true  asymptotic  expansions 

for  values  t«  t    or-t»  t  .    In  particular  their  derivation 

o  o 

depended  upon  the  fact  that  the  'time  of  maximum  response  was 
related  to  the  number  of  four  terminal  networks  by  means  of 
the  equation 


so  that  as  n  —  oo  ,  tQ  —  oo  . 

Other  types  of  expansion  are  clearly  possible. 
Two  obvious  alternatives  are: 

(1)  Those  valid  for  fixed  n  as  t  —  oo  ; 

(2)  Those  valid  for  fixed  t  as  n      co  . 

The  first  of  these  will  not  be  considered  here  since 
they  are  of  little  interest  as  all  of  the  four  terminal  networks  - 
have  been  assumed  to  be  absolutely  stable.    The  interested  reader 
is  referred  to  the  book  by  Doetsch  on  Laplace  Transformations 
for  expansions  of  this  type. 

Since  the  second  type  of  expansion  is  of  interest 
here  and  is  not  to  be  found  in  most  of  the  standard  reference 
works  it  will  be  discussed  here  briefly. 

In  a  classic  paper,  M.  Hamy*  derived  general  ex- 
pansions of  this  type  for  complex  integrals  of  the  form 

J  f(z)  <pn(z)dz 

♦journal  de  Mathematique,  vol.  4,  6th  series,  1908,  page  203. 

under  a  variety  of  hypotheses  on  f(z)  and  <p(  z)  .     These  condi- 
tions include  the  case  where  qr(z)  has  a  saddle  point  given 
by  the  solution  of  tp*(z)  =0  and  the  result  of  this  case  is  a 
generalization  of  the  often-used  theorem  of  Fowler  which  one 
finds  in  his  book  on  statistical  mechanics  under  the  title  of 
the  saddle  point  method. 

More  to  the  point,  they  also  include  the  case 
where  cp(z)  has  one  or  more  maxima  on  the  path  of  integration 
at  which  <p*(z)  =0  provided  that  f(z)  admits  a  Taylor  series 
expansion  about  these  points.    In  particular,  then,  if  one 
considers  t  as  a  fixed  parameter 'they  apply  to  the  integral 
of  equation  (1),  with  c  =  0  and  <p(  z)  =  y(p);  f(z)  =  ePtvQ(p). 

In  terms  of  our  notation,  one  finds  that: 

(a)  for  an  impulsive  input  with  gain  maxima  at  <*)  =  wQ 

2An(cO  x 
VtJ  ~  nB'(a>°)   COS  rV  +  n  B(u,o):i  +  term  in  ^  * 

(b)  for  a  unit  step  input  with  gain  maxima  at  w  =  uQ  f  0. 

2An(w  )  , 

Vn(t)  ?a    COS  [V  +  nB^o]^  +  termS  in  —  ' 

■  v  o'  o  n 

It  is  interesting  to  note  that  these  formula  indicate 
a  dependence  upon  1/n  instead  of  1/Vn  as  in  the  case  of  the 
previous  expansion.    These  formulae  can  be  thought  of  as  repre- 
senting the  response  in  the  band-pass  case  for  any  fixed  t, 
t«  tQ. 


Appendix  I 


Certain  remarks  of  Aueral  Winter*  on  the  justification 
of  the  principle  of  stationary  phase  are  pertinent  enough  to 
the  above  discussion  to  bear  repetition  here.    In  order  for  the 


f(x)  e^(x,dx 

to  be  asumptotically  represented  as  p  —  oo  ,  by  the  formula 
(Cf.  Lamb,  Hydrodynamics  p  395) 

(26)  a      ^J^ToT     .    e  irP9(a)±inJ 

.  y|pltp"(a)l 

where  cp'(a)  ■  0  and  where  the  upper  or  lower  sign  is  to  be 
taken  according  as  <p"(a)  is  positive  or  negative,  it  is 
evident  that  two  things  are  sufficient. 

(1)  The  contribution  to  the  integral  outside  a  small  interval 
around  the  stationary  value  a  of  <p(a)  must  decrease  more 
rapidly  as  a  function  of  p  than  the  one  obtained  in  the 
neighborhood  of  a; 

(2)  The  asymptotic  formula  given  above  must  adequately  re- 
present the  behavior  of  the    contribution  to  the  integral 
from  the  neighborhood  of.  the  stationary  value  a. 

Now,  if,  on  any  closed  interval  I,  <p*(x)  is  continuous 
and  has  no  zeros,  and  if  <p(x)  is  strictly  monotone  in  this  inter- 
val, then  z  =  <p(x)  can  be  introduced  as  a  variable  of  integration 
on  that  interval,  transforming  S  into 

*  Method  of  Stationary  Phase  Journal  of  Math.  &  Physics, 
vol  24,  no  3-4  -  1945 

-  34  - 

f(x)  e^(x)  dx 

f  [^(zJJ  eipz  dz 

If,  in  addition  to  the  above,  <p(x)  and  tpf,(x)  are  continuous 
and  if  f(x)  and'f'(x)  exist  and  are  continuous,  this  last 
integral  can  be  integrated  by  parts,  giving 

S  = 

|  fr^uneip2j 






e±PZ  A  fCT_i(z)]dz 


and  showing  that  on  any  such  interval  I, 


Thus,  condition  (1)  will  be  satisfied  if,  in  the 
neighborhood  of  the  stationary 
the  integral  is  greater  than 



a,  the  contribution  to 

This  is  clearly  the  case  when  the  asymptotic  formula 
(26)  is  valid,  since  there  the  dependences  on  p  is  as  1/vp. 
it  can  be  shown  that  (26)  is  valid  whenever 


tp(ct)  =  0,  <ptf(a)  f  0  and  <p«  •  (x)  and  f|> 

are  of  bounded  variation  in  the  neighborhood  of  the  stationary 
value.    Thus,  to  recapitulate,  under  these  conditions,  the 
maximum  contribution  comes  from  the  stationary  point  and  depends 
on  p  as  l/vpt  while  the  points  which  are  not  near  the  stationary 
point  contribute  terms  depending  upon  p  only  as  l/p , 

To  conclude  this  brief  appendix,  it  should  be  remarked 
that  Winter  gives  an  extension  of  (10)  which  is  valid  under 
the  same  condition  of  f[tp~l(z)]  if  the  first  n  derivatives  of 
<p(x)  vanish  at  some  point  a  while  cpn+1(x)  does  not.    These  results 
could  be  used  to  extend  the  treatment  of  the  high-pass  case 
given  above  to  the  cases  in  whion  a2  +  2b  =  0,  etc. 

C.  L.  DOLPH 


B-392415  to  392428 

FIG.  3 







'— (OOI=U)% 
— (0S=U),1. 

125  db- 












)x.-y  / 

5  • 

•  \ 


\      *            \  \ 




\  \2 








/  1 






FIG.  16 

"»  A 

Electronic  Methods  in  Telephone  Switching 

C.  E.  Shannon 

In  the  recent  development  of  electronic  digital  computing  machines  various  new 
tubes  and  other  electronic  devices  have  been  designed  which  may  be  of  use  in 
machine  switching.  In  particular  the  "selectron"  tube  developed  by  R.  C.  A.  and  the 
mercury  acoustic  delay  tank  provide  large  cheap  memory  devices  in  which  information 
can  be  registered  or  read  off  in  electronic  time  intervals  (of  the  order  of 
microseconds).  Since  one  of  the  chief  functions  of  the  relays  and  switches  in  a 
telephone  exchange  is  that  of  memory  (e.g.  the  relays  remember  which  calling  and 
called  lines  should  be  connected  together)  it  is  worth  while  considering  the  possibility 
of  using  such  tubes  to  replace  ordinary  electro-mechanical  switching  equipment. 

Suppose  we  have  an  exchange  (or  set  of  exchanges)  serving  n  subscribers  and  that 
the  exchange  can  handle  a  peak  load  of  m  simultaneous  conversations.  These  may  be 
between  any  m  pairs  of  the  subscribers.  Thus  the  exchange  must  be  capable  of 
assuming  as  many  different  states  as  there  are  of  selecting  m  pairs  of  objects  from  n . 
This  can  be  done  in 


ml  2m(n  -  2m)! 

different  ways.  For  n  and  m  large  the  logarithm  of  this  is  approximately  2m  log  n . 
If  the  logarithm  is  to  the  base  ten  then  this  is  the  required  memory  capacity  of  the 
exchange  measured  in  decimal  digits.   If  the  logarithmic  base  is  two  the  units  are 

binary  digits.  A  single  two-position  relay  has  a  capacity  of  log  2  units  (one  binary 
digit  or  .30103  decimal  digits),  while  5  relays  have  S  log  2  units.  A  10  x  10  crossbar 
switch  has  a  capacity  of  10  log  10,  while  a  single  commutator  on  a  panel  has  capacity 
log  r ,  where  r  is  the  number  of  vertical  positions  of  the  brushes.  Hence  the  number 
of  relays  required  for  a  pure  relay  exchange  would  be 

2m  log  n 
log  2  ' 

the  number  of  10  x  10  crossbars  would  be 

2m  log  n 
10  log  10  ' 

etc.  To  these  estimates  must  be  added  the  losses  due  to  inefficient  use  of  the  memory 
and  also  the  memory  of  equipment  used  for  functions  other  than  merely  remembering 
which  connections  are  being  held  at  a  given  time. 

An  ordinary  relay  is  capable  of  remembering  (by  a  holding  circuit)  one  binary 
digit.  A  pair  of  vacuum  tubes  in  a  flip-flop  circuit  has  the  same  memory  capacity. 
The  cost  of  these  is  of  comparable  magnitude,  and  thus  if  one  designed  an  electronic 
telephone  exchange  by  merely  changing  relays  to  equivalent  vacuum  tube  circuits  the 
chief  advantage  of  the  electronic  circuit  would  be  one  of  speed,  an  improvement  of 
order  103.  In  many  cases  this  could  produce  a  reduction  of  cost  since  frequently  many 
identical  units  of  a  certain  type  must  be  supplied  because  the  individual  units  are  slow. 
This  is  apt  to  be  the  case  with  units  which  are  associated  with  the  beginning  or  end  of 
calls  but  need  not  be  used  during  the  conversation.  On  the  other  hand  equipment  to 
be  used  throughout  the  call  would  offer  less  advantage  under  this  tube  for  relay 
replacement  since  the  expected  duration  of  calls  is  long  compared  to  electronic  times. 

The  newer  electronic  memory  devices,  however,  change  this  picture  considerably. 
A  selectron  tube  (when  these  tubes  are  in  production)  may  be  expected  to  cost  $100  or 
less  depending  on  the  demand.  It  is  capable  of  holding  4096  binary  digits,  giving  a 
cost  per  binary  digit  of  the  order  of  2.5  cents,  while  the  cost  of  the  equivalent  relay 
may  be  of  the  order  of  2.5  dollars.  Mercury  delay  lines  can  store  information  at  a 
comparable  cost.  Thus  it  is  not  impossible  that  a  reduction  of  the  order  100  to  1  in 
switching  equipment  cost  might  be  possible  by  the  use  of  electronic  devices,  even  in 
the  parts  where  information  must  be  stored  for  long  periods  of  time. 

An  indication  of  how  such  tubes  may  be  used  is  given  in  the  attached  figure. 
Fig.  1  is  a  block  diagram  of  a  simplified  exchange.  The  calling  parties  are  connected 
to  an  electronic  commutator  which  samples  the  speech  signals  periodically  and  puts 
the  various  lines  in  the  time  division  multiplex.  The  called  parties  are  also  connected 
in  time  division  multiplex  to  a  single  channel  by  means  of  an  electronic  commutator 
or  distributor.  The  function  of  the  middle  part  is  to  rearrange  the  samples  in  such  a 
way  as  to  provide  any  desired  interconnection  between  calling  and  called  parties.  This 
is  done  by  dividing  the  sampling  period  into  two  equal  parts.  During  the  first  half  the 
signal  plate  of  the  upper  selectron  is  connected  by  gate  1  into  the  calling  line 
multiplex  channel.  Its  windows  are  caused  to  open  in  sequence.  Thus  at  the  end  of 
the  first  half-cycle  the  first  samples  of  all  the  incoming  channels  have  been  written  on 
the  face  of  the  tube  in  their  regular  order.  During  the  second  half-cycle  gates  1  and  3 
are  closed  and  gates  2  and  4  are  opened.  Thus  the  output  of  the  selectron  is  fed  into 
the  called  line  multiplex  and  the  windows  of  the  selectron  are  controlled  by  the  other 
selectron  tube  2.  This  tube  has  registered  in  a  suitable  notation  the  numbers  of  the 

called  line  desired  by  the  calling  line.  The  windows  of  this  tube  are  opened 
sequentially  by  the  cycling  unit  and  the  numbers  registered  there  control  the  windows 
on  tube  1  allowing  the  sample  from  calling  channel  1  to  go  into  the  proper  place  in 
the  called  line  TDM. 

By  a  more  elaborate  system  it  is  possible  to  make  use  of  the  fact  that  only  a  small 
fraction  of  the  lines  will  be  busy  at  a  given  time,  as  is  done  in  ordinary  relay 
switching.  This  can  be  achieved  by  only  supplying  enough  places  in  the  distributors 
for  the  peak  load.  When  a  call  originates  the  calling  and  called  parties  are  assigned 
idle  spaces  in  the  distributor.  The  place  assigned  to  the  called  party  is  registered  in 
the  selectron  register  corresponding  to  the  place  assigned  to  the  calling  party. 

Some  Generalizations  of  the  Sampling  Theorem 

We  have  seen  that  a  function  of  time  f(t)  containing 
no  frequencies  over  W  cycles  per  second  can  be  described  by- 
giving  its  value  at  Nyquist  intervals  (spaced  ^  seconds  apart). 
It  can  be  reconstructed  from  these  samples  using  the  basic 
functions  sin  2nWt/2nWt ,  together  with  the  same  function  shifted 
by  integer  numbers  of  Nyquist  intervals.    We  now  consider  some 
generalizations  of  this  result. 

In  the  first  place  the  particular  function 
sin  2nWt/2nWt  is  by  no  means  necessary  for  the  reconstruction. 
In  fact  any  function  cp(t)  which  contains  all  frequencies  up  to 
W  is  satisfactory.    More  precisely  the  spectrum  of  cp(t)  should 
not  vanish  over  any  finite  set  of  frequencies  (set  of  positive 
measure)  up  to  W.    If  <p(t)  satisfies  this  condition  the  original 
function  f  (t)  can  be  reconstructed  using  cp(t)  and  its  shifted 
images  <p(t  +  ~) .    That  is  coefficients  a£  can  be  found  such 

°°  K 
f (t)  =     2      aK  q>(t  +  f»)  . 
j[ — _  00    *»•  *w 

In  general  the  coefficients  are  not  found  as  easily  as  in  the 
special  case  where  cp(t)  =  sin  2nWt/2nWt  (when  they  are  merely 
the  values  of  f (t)  at  the  Nyquist  points)  but  they  may  be 
calculated  as  follows.    Let  F(w)  be  the  spectrum  of  f (t)  and 
$((0)  be  the  spectrum  of  cp(t).    Expand  the  function  F((d)/$(co)  in 
a  Fourier  series  using  -W  to  4W  as  the  fundamental  interval. 

-  2  - 



F(cj)  _  T  _     _  2W 
ft(u)  ~  L  SK  6 

°r  £& 

F(w)  =  Z  aK  0>(oj)  e  2W  . 

Taking  the  transform  of  the  equation  we  obtain  the  desired 

f(t)  =  2  aK  cp(t  +  !y)  . 

The  coefficients  in  the  expansion  can  therefore  be  determined  as 
the  coefficient  of  a  Fourier  series  expansion  of  F(w)/<I>(<d)  .  In 
general  the  function  cp(t  +  ^)  will  not  form  an  orthogonal  set 
and  therefore  the  energy  in  f(t)  cannot  be  found  from  2  aK  as  it 
was  in  the  simple  case  where  «p(t)  =  sin  2nWt/2nWt. 

A  physical  method  of  performing  this  expansion  can 
also  be  given.    Consider  a  filter  which  gives  the  output 
sin  2nWt/2nWt  when  the  input  is  <p(t) .    If  the  function  f(t)  is 
passed  through  this  filter  the  amplitudes  of  the  output  at 
Nyquist  intervals  will  be  the  desired  coefficients.    This  is 
true  since  this  output  can  be  considered  as  expanded  in  the 
f mictions  sin  2TrWt/2rrWt  with  the  amplitudes  as  coefficients, 
and  the  inverse  filter  would  restore  the  original  function  and 
change  each  of  these  functions  with  cp(t)  at  the  corresponding 
Nyquist  point. 

A  function  f (t)  can  also  be  determined  from  a  knowledge 
of  its  value  and  derivative  at  alternate  Nyquist  points: 

We  have  here  the  same  number  of  measurements  per  second,  2W, 
but  half  of  these  are  ordinates  of  f(t)  and  half  are  derivatives. 
The  reconstruction  of  f(t)  from  these  values  can  be  carried  out 
simply  using  two  basic  functions: 

_  (  +  x  _  sin2  nWt 

Tllt)  '"wmT 

m      x    .  sin2  rrWt 
*2{t)  ~      (nWt)  * 

Both  of  these  lie  entirely  within  the  band  W  and       has  the 
property  that  it  and  its  first  derivative  vanish  at  alternate 
Nyquist  points  (except  for  t  =0  where  the  function  is  1  and 
its  first  derivative  0) .    Likewise  cp2  and  cp£  vanish  at  alternate 
Nyquist  points  except  at  t  =  0  where  cp2  =  0  and  (p2  =  1.  Thus 
we  can  fit  the  ordinates  of  the  original  function  f (t)  using  ^ 
and  its  shifted  images  (shifted  by  two  Nyquist  intervals).  The 
derivaties  of  f(t)  are  fitted  using  cp2  and  its  shifted  images. 
Due  to  the  vanishing  of  these  functions  none  of  the  fittings 
interfere.    The  function  constructed  by  this  process  must  lie 
within  the  band  and  have  the  same  values  and  derivatives  as  the 
original  function  f (t)  at  alternate  Nyquist  points.    That  there 
is  only  one  such  function  can  be  shown  by  arguments  similar  to 
those  used  in  the  basic  sampling  theorem,  generalized  by  break- 
ing down  the  spectrum  into  an  even  and  an  odd  part. 

-  4  - 

It  is  possible  to  carry  this  further  and  determine  a 
function  from  knowledge  of  its  value  and  first  (n  -  1) 
derivative  at  points  separated  n  Nyquist  intervals  apart.  In 
this  case  the  basic  functions  are 

sin11  (Sgfc) 

*1  = 


1  n  ' 

_  sinn  (agt) 

1   n  ' 

s.nn  (2^t} 

K~ n~"; 

rn  2nWt 

These  functions  possess  the  properties: 

1.  They  lie  within  the  band  W. 

2.  They  vanish  at  t  =  |g     K  =  ±  1,  ±  2,  ... , 
(that  is  at  n-th  Nyquist  points)  and  also  their 
1st,  2nd,  (n-1)  derivatives. 

3.    At  t  =  0,  all  derivatives  of  cp_  vanish  except  the  s-th 


derivative  which  is  1. 

Consequently  we  can  reconstruct  f(t)  by  using  <pg  to 
adjust  the  s  derivatives  (s  =  0,  1,  n-1)  and  these  adjust- 

ments will  not  interfere. 

The  functions  q;    and  their  spectra  are  shown  in  Fig.  1 

for  the  cases  n  =  1,  2,  3* 



e  1 

March  4,  194S 


The  Normal  Ergodic  Ensembles  of  Functions 

Among  the  possible  probability  distributions  in  a  one- 
dimensional  space  certain  ones  are  of  special  importance  because 
of  their  simple  mathematical  properties  and  frequent  occurrence 
in  the  physical  world.    The  most  important  of  these  is  the 
normal  or  Gaussian  distribution  with  a  density  function: 

1/J2R  a  exp  £  |  x2/<^ 

In  an  n-dimensional  space  the  most  important  distribution  func- 
tion is  an  n-dimensional  generalization  of  this,  the  n- 
dimensional  normal  distribution: 

i       5       r  -  -i 
^IV<a»r  e*P  ai;j  xi  xj 

Here  a^  is  the  associated  quadratic  form  and  the 
determinant  of  this  form.    This  form  is  positive  definite  and 
the  surfaces  of  the  constant  probability  are  found  by  setting 
the  argument  of  the  exponential  function  equal  to  a  constant 

2  H .  x±  Xj  =  C 

and  are  therefore  coaxial  elipsoids  in  the  space.    The  direc- 
tions of  the  axes  of  this  elipsoid  are  those  of  the  eigen- 
vectors of  the  form  a^  and  the  lengths  are  inversely  proportional 
to  the  corresponding  eigenvalues.    By  a  rotation  of  axes  the  new 
coordinate  system  can  be  lined  up  with  these  directions  and  the 
distribution  function  reduced  to 

-  2  - 


{X1»         #oe»  V     (2n)      exp  -  |  Z  5^  y* 

where  the  \±  are  the  (positive)  eigenvalues  and  the  y^^  are  the 
new  coordinates.  The  form  a^j  being  positive  definite  has  an 
inverse  A^j  which  is  also  positive  definite  with  eigenvalues 

The  properties  of  the  n-dimension  normal  distribution 
which  give  it  particular  mathematical  importance  are  the 

1.  If  x±  and  y±  are  two  chance  vector  variables,  which 
are  independent  and  distributed  according  to  n-dimensional 
normal  distributions  with  quadratic  forms  a^  and  b^.  (inverses 
A^j  and  B^) ,  then  the  chance  vector  variable       =  x±  +  Ji  is 
also  distributed  normally  with  the  form  c^y  whose  inverse  is 

Cij  =  fij  +  Bij° 

2.  If  x    is  a  normally  distributed  vector  variable  and 

yj  =  2  r^j  x^  is  a  vector  variable  which  is  a  linear  operation 
on       (possibly  of  smaller  dimension  thann)  then  yj  is  normally 
distributed  with  the  inverse  form 

=   Z    r,    r^  Ast  • 
ij      s,t    is  jt 

,3.    Under  certain  quite  broad  conditions  the  resultant  of 
a  large  number  of  small  chance  vector  variables,  x®  (s  =  1,  2,  N) 
with  arbitrary  distribution  functions,  which  are  independent 
gives  a  normal  distribution  for 

3  - 


providing  no  term  of  the  sum  contributes  more  than  a  small 
fraction  to  any  B. 

4,  If  the  a  priori  probabilities  for  each  of  two 
independent  vectors  xi  and  y±  are  both  normal,  the  a  posteriori 
probability  of  x^  when  we  know  the  sum  x±  +  7^  —  ^  is 
normally  distributed  (about  a  displaced  mean,  however). 

5.  The  mean  value  of  x±  x^  for  x±  normal  is  given  by 

xi  xj  =  Aij  * 

Among  the  many  possible  ergodic  ensembles  of  functions 
fa(t)  there  is  also  a  certain  class  of  particular  mathematical 
and  physical  importance.    This  class  of  ensembles  can  be  con- 
sidered a  generalization  of  the  n-dimensional  normal  distribution 
to  infinite  dimensional  function  spaces  ergodic  under  trans- 
lations in  time.    We  shall  call  these  normal  ergodic  ensembles 
of  functions.    They  are  completely  specified  by  giving  their 
power  spectra  P(w)  or  their  autocorrelation  functions  A(t) 
which  are  the  Fourier  transforms  of  the  power  spectra.  The 
normal  ergodic  ensembles  can  be  defined  in  various  ways.  They 
occur  physically  when  we  pass  a  thermal  noise  through  a  filter, 
shaping  the  power  spectrum  to  P(w)  =  |l(w)|2,  T(«)  being  the 
admittance  of  the  filter. 

In  the  literature  on  noise  these  ensembles  are  often 
treated  in  a  loose  somewhat  illogical  fashion  by  using  either 
of  two  "representations."    The  first  representation  is 


2    |P(nAf)Af  cos  (nAft  +  6  )  . 

The  6n  are  all  uniformly  and  independently  distributed  over  all 
values  from  0  to  2n.    This  representation  amounts  to  making  the 
noise  the  sum  of  a  large  number  of  small  sinusoidal  waves  with 
random  phases,  and  amplitudes  adjusted  to  give  the  proper  power 
density  in  any  small  frequency  range.    The  frequency  increment 
between  adjacent  waves  Af  is  supposedly  very  small  and  in  use 
one  evaluates  any  desired  statistic  of  this  set  of  functions  and 
determines  the  limit  approached  by  this  statistic  as  Af  -  0. 
This  limit  is  taken  to  be  the  desired  statistic  of  the  normal 
ergodic  ensemble.    The  second  representation  is  similar  but  uses 
normally  distributed  amplitudes  an  whose  variance  cr    is  equal 
to  P(«) 

2  aBAf  cos  (nAft  +  6J  . 

Actually  these  "representations"  will  not  give  the 
correct  answer  in  all  cases.    For  example,  if  we  ask  what 
fraction  of  the  functions  in  the  representation  ensemble  r^ 
are  periodic,  we  find  that  all  are,  so  the  probability  is  unity, 
and  the  limit  as  Af     0  is  also  therefore  unity,  while  almost 
none  of  the  functions  in  the  ergodic  normal  ensemble  are  periodic 
However  it  can  be  shown  that  if  we  restrict  ourselves  to  what  we 

have  called  physical  statistics,  the  answer  will  be  identical; 
the  normal  ergodic  ensemble  is  the  physical  limit  of  either  of 
the  above  ensembles  as  Af  -*  0, 

A  more  logical  definition  of  a  normal  ergodic  ensemble 
can  be  given  as  follows.    We  divide  the  frequency  range  up  into 
unit  intervals  and  construct  the  sequence  of  "flat"  ensembles 
for  these  intervals.    These  will  be  given  by 

2  a„  sin  nt  • 

These  ensembles  are  passed  through  shaping  filters  to  give  the 
proper  power  spectrum  in  the  interval  in  question  and  the  results 

The  normal  ergodic  ensembles  have  properties  analogous 
to  the  n-dimensional  normal  distributions  which  we  have  given. 
We  have 

Theorem:    The  sum  of  two  functions  fQ(t)  +  gp(t)  where  f  and  g 
are  from  normal  ergodic  ensembles  with  spectra 
and  P2  is  normal  ergodic  with  spectrum  P1  +  P2. 

Theorem:    The  output  of  any  linear  invariant  transducer  driven 
by  a  normal  ergodic  ensemble  is  normal  ergodic  with 
spectrum  |Y(«)|  P(w). 

Theorem:    Any  finite  dimensional  linear  operation  on  a  normal 
ergodic  ensemble  gives  a  normally  distributed  vector. 

March  15,  194$ 



Systems  Which  Approach  the  Ideal  as  g  —  00 

We  will  show  that  it  is  possible  to  construct  an 


instantaneous  system  for  sufficiently  large  -  for  transmitting 
a  sequence  of  binary  digits  such  that  the  frequency  of  errors 
is  arbitrarily  small  and  the  power  required  only  slightly 
greater  in  db  than  the  ideal  for  the  corrected  rate  of  trans- 
mission.   More  precisely  we  have  the 

Theorem:    Given  any  e>0  and  8  >  0  we  can  transmit  binary  digits 
on  an  instantaneous  basis  with  frequency  of  errors 
<  e  and  corrected  rate  of  transmission 

R  >  W  log  -jl  +  (1  -  5)  |  J 

The  system  to  be  used  is  of  PCM  type  with  an  extremely  large 
number  of  amplitude  levels.    Let  there  be  2s  levels,  and  number 
them  with  a  binary  notation,  but  in  the  Stibitz  type  code,  so 
that  only  one  binary  digit  changes  on  going  to  an  adjacent 
level.    If  we  are  in  error  by  d  levels,  at  most  d  binary  digits 
of  the  s  will  be  incorrect.    If  there  are  many  levels  in  the  a 
distance  U/I)  of  the  noise  the  expected  number  of  errors  will 
be  approximately 



We  take  £  large  enough  so  that  es  >  a. 

Thus  the  frequence  of 

errors  in  our  final  result  will  be  <  e.    The  levels  should  not 
be  spaced  uniformly  but  according  to  the  density  of  a  normal 
distribution.    If  this  is  done  the  received  signal  will  be 
nearly  Gaussian  with  a  —  J?  +  N  and  the  corrected  rate  of 

H  >  W  log    1  +  (1  -  5)  | 

C.  £•  SHANNON 

March  29,  194$ 


Theorems  on  Statistical  Socuencea 

If  It  la  poaalbla  to  go  froa  any  state  with  P  >  0 
to  any  other  alone  a  path  of  probability  p  >  0,  tha  system  la 
argodlo  and  tha  atrong  law  of  large  nuabera  can  be  applied. 
Thus  the  number  of  tines  a  given  path  p^j  in  the  network  la 
traversed  in  a  long  sequence  of  length  K  is  about  proportional 
to  the  probability  of  being  at  i  and  then  chosaing  this  path, 
P.p. 4K.    If  N  is  larne  enough  the  probability  of  percentage 
error  i  6  In  thia  la  less  than  c  so  that  for  all  but  a  aet  of 
email  probability  the  actual  numbers  lie  within  the  limits 

Hence  the  probability  that  nearly  all  sequences  lie  within 
limits  ±  ft  is  given  by 

and  lfijLJfc    lB  limited  by 

•  I(PlPiJ  ±  |)log  PiJ 


|  ^  -  *  PiPij  log  Pijj  <  * 
Thus  we  have I 

Theorem    For  almost  all  sequences 


Um  '  to*-*    •  H  •  -  i  PiPij  log  Pjj 

where  p  is  the  probability  of  the  sequence  baring  the  block 
of  length  L  starting  at  the  first  position. 

Thus  for  all  but  a  set  of  blocks  of  probability  <  « 
and  for  B  large  enough 

(H  -  $)«<-  log  p  <  (H  ♦  n)H 
*.p(H  -  q)H.  <  —  p  log  p    <      P(H  ♦  n)M 
where  «e  hare  aummed  orer  all  but  the  set  of  small  probability 
i.  p(H  ♦  a.)I   £   (I  ♦  sJM  *  P  S  W  *  *>* 

and   *  p(H  -  q)*     (H  -  q)I  *  P     U  -  q>  ■  U  -  •> 
For  the  sot  of  oaall  probability 

•I  p  log  p 

^  log  ^ 

since  this  is  maximised  f or  ip  •  t  by  making  all  p  equal,  and 
the  number  of  them  1  -Jj  •    But  this  is  dominated  by 

•  l  P  log  p|   £    |«W  lo«  | 

1  •» 

with  «  as  snail  as  d« sired  for  sufficiently  large  K  and  small  c. 
Henee  this  does  not  affect  the  sua  ia  the  limit  as  I  -*  oo  and 
we  have  the 


Lia  £  I  p  (Bt)  log  p(BL)  -  H 
I  -  oo 

where  plB^  is  ths  probability  of  block  B^  of  length  L,  and 
the  sua  is  ovsr  all  possible  blocks. 

We  now  prove  the 

Theorem      H  •  -  i.  p(BijSj)  log  PB^8!* 

«  Lie   -*  q(BtSj)  log  qB  (3^) 

where  p(Blt8j)  is  the  probability  of  block  Bi  followed  by  8^  and 
PB^Sj)  is  the  conditional  probability  of  8j  after  the  block  Bt 
ia  known  to  occur.    q(Blt8j)  in  the  probability  when  B^  ia 
computed  on  the  basis  of  any  initial  state  probabilities,  not 
necessarily  the  proper  ones  and  q^Sj)  the  corresponding  condi- 
tional probabilities. 

The  first  equality  is  trus  since  we  may  summ  first  on 
all  B±  leading  to  a  given  state  K.    *he  terms  q,B^CS ^)  are  then 
all  equal  to  Pjj  and  the  terse  qlB^j)  sum  to  PKPjj  gives  the 
desired  result. 

If  the  q»s  are  used,  the  q^lSj^  are  still  p^  where 
I  It  the  stat*  In  which  B±  ends. 

*      qU-.S.)    •    pkj      i.  P(B1) 

since  any  Initial  distribution  tends  toward  equilibrium. 

We  hare  shown  that  apart  from  a  set  of  small  probability, 
the  probabilities  of  blocks  of  length  L  lie  within  the  limits 

-(H  -  S)M  .(H  ♦  S)M 

*  <  S>  <  2 

where  S  can  be  made  small  by  taking  B  large  enough.  Let  the 
maximum  number  of  blocks  of  length  M  when  we  delete  a  set  of 
measure  •  be  Qg(«).  Thent 

I        p  -  (1  -  t) 

Q  (I)  p       -  Q  (M)  2*lH  *  *)M 
t         max  c 

log  0tl«)  >  (H  ♦  6)M  ♦  log(l  -  t) 


log  0  (li) 
Lim   S         -    %U)  £  8 

I  -CO  II 


1  >  I  p  >  GC(K)  pj^B 

frota  which  we  obtain 

log  0 


•U)  *  H 

Hence  we  hare 

Theoremi     vU)  -  »      'or     t  J1  0,  1 

Tha  fact  that  for  large  M  nearly  all  blocks  hare  a 
probability  limited  by 

ri°JLE  ♦  s 

<  * 

does  not  imply  that  those  probabilities  approach  equality. 
In  fact  they  will  generally  diverge  from  one  another  but  the 
db  range  becomes  small  compared  to  K,  eince  for  p's  satisfying 


this  inequality 

*»«  Pmax     lQg  Pmln  m  log  _ 
I  II  1 

It  it  possible  to  show,  however,  that  thert  exists  among  the 
blocks  of  length  It  a  subset,  all  of  equal  probability  which 
hare  the  sane  growth  with  K  as  the  set  including  all  blocks 
except  those  of  small  probability  totaling  less  than  t:  namely , 
the  subset  will  contain  more  than  2*H  "  ^N  eleoents  with  5 
arbitrarily  small. 

Consider  all  blocks  beginning  in  a  given  state,  say 

state  1,  and  ending  in  this  state.    Let  these  blocks  B1  

fig*...  have  lengths  n^,  n2,....,  t^,  ....  and  conditional 

probabilities  p^,  p2,          pat  .....  when  we  start  from  state  1. 

We  first  prove 


Theorem:  I  p^n^  •  p^ 

The  first  part  is  true  since  the  ergodic  character  of  the  system 
makes  the  Inverse  frequency  of  occurrence  of  state  1,  equal 
to  the  mean  distance  between  its  occurrences,  I  Pi*i«  The 
second  part  is  true  since  almost  all  blocks  of  large  length  N 
have  approximated  the  proper  frequency  of  each  B^. 

Now  we  return  to  the  construction  of  a  subset  of  growth 

(H  .  6)1 

2  all  of  equal  probability*    Let  us  choose  integers 

ai    at  close  as  possible  to 

and  construct  sequences  with  of  the  block  B±  .  The  number 
of  block*  is  then 

and  the  number  of  sequences: 

»  <-  Pt  log  pt 

The  growth  Is  then  in  term*  of  symbols 

lag*  . ,  *  4*  . 

This  proves  the  following! 

Theorems  Given  I  >  0  there  exists  a  set  of  M  blocks  of  length  X 
(when  H  is  sufficiently  large)  such  that 

AS  -  ft)S 

k>  a 

and  each  block  has  the  same  probability,  and  starts  and  ends  in 
the  eeme  state,  which  can  be  chosen  arbitrarily* 

In  case  the  system  is  not  ergodle  but  made  up  of  a 
finite  number  of  ergodle  systems: 

r  -  X  ctrt 

each  rt  will  hare  a  rate  Hi  which  we  may  assume  arrengee  in  a 
now  increasing  sequence 

The  function  %{•)  then  bieoMi  a  decreasing  atep  function  in  the 
manner  Indicated  by  the  following I 

Theorem!     In  the  case  conaidered 


?(c)  •       in  the  internal    la^     <i<   j  ^ 

For  if  c  it  in  the  range  indicated  we  oust  take  a  set 
of  poaitiTe  probabilities  froa  at  least  one  of  r1#  ...»  rj. 
This  gives  a  growth  of  type 

at  least,  and  can  be  limited  to  this  by  choosing  all  sequences 
The  quantity 

will  be  called  the  man  statistical  rata  for  the  system. 


April  26,  194* 

Samples  of  Statistical  English 
C  B  S^a**o* 

A  number  of  samples  of  statistical  English  including 
probability  structure  out  to  four,  words  are  given  below.  These 
were  constructed  by  starting  off  with  three  words  from  a  book. 
These  three  words  are  shown  to  someone  who  fits  them  in  a 
reasonable  English  sentence  and  writes  down  the  word  following 
the  three.    The  first  word  is  then  covered  up  and  the  process 
repeated  with  a  different  person,  etc.    If  the  imagined  sentence 
ends  after  the  added  word,  the  person  writing  the  word  adds  a 
period.    For  samples  bearing  a  title  the  participants  were  told 
that  this  was  the  subject  dealt  with.    These  samples  may  be 
compared  with  those  in  "A  Mathematical  Theory  of  Communication" 
where  less  statistical  structure  is  included. 

The  samples  given  here  were  obtained    for  the  most 
part,  with  the  aid  of  J.  R.  Pierce,  B.  McMillan,  C.  C.  Cutler 
and  W.  E.  Mathews,    A  few  of  the  samples  were  obtained  from 
other  sources  (contemporary  literature,  etc.)  and  are  included 
for  comparison.    The  reader  may  try  his  skill  at  guessing  which 
are  statistically  constructed.    The  true  sources  are  given  at 
the  end. 

1.  This  was  the  first.    The  second  time  it  happened  without 
his  approval.    Nevertheless  it  cannot  be  done.    It  could 
hardly  have  been  the  only  living  veteran  of  the  foreign 
power  had  stated  that  never  more  could  happen.  Conse- 
quently people  seldom  try  it. 

2.  John  now  disported  a  fine  new  hat.    I  paid  plenty  for  the 
food.    When  cooked  asparagus  has  a  delicious  flavor  sug- 
gesting apples.    If  anyone  wants  my  wife  or  any  other 
physicist  would  not  believe  my  own  eyes.    I  would  believe 
my  own  word. 

3.  That  was  a  relief  whenever  you  be  let  your  mind  go  free 
who  knows  if  that  pork  chop  I  took  with  my  cup  of  tea 
after  was  quite  good  with  the  heat  I  couldn*t  smell  any- 
thing off  it  ITm  sure  that  queer  looking  man  in  the 

4.  In  a  few  days  was  the  minimum  amount  of  money  remaining  to 
the  end.    However  everyone  knows  the  meaning  implied.  It 
was  true  when  Cutler  says  that  we  should  proceed  care- 
fully.   When  you  love  yourself  too  much.,  The  woman  who 

5.  Fourscore  and  twenty  years  passed  before  we  could  meet  them 
that  isn't  already  done  should  have  been  a  good  son  is 
going  fast  according  to  the  teacher  of  his  ability.  His 
intelligence  sufficed  for  the  time.    This  cannot  change 

-  2  - 

6.  Even  the  killing  was  atrociously  perpretated  by  the 
cruelest  treatment  that  a  small  boy  jumped  over  the  hedge 
and  buried  her.    A  grave  fault  of  many  approaches  to  the 
furthermost  reaches  of  the  state.    Politics  and  business 
are  becoming  lost  to  the . 

7.  It  is  an  Italian  ox  mouth  dish.    The  only  thing  in  the 
room  is  worms.    I  am  the  director  of  the  seminar.    In  an 
evolving  hemisphere.    C'est  Monsieur  Jardin.    I  am  a 
patient.    Oh  my  dear  Plapsen,  you  are  my  dearest  Klapsen. 

He  took  it  with  many  other  matters  are  more  apparent  if 
they  think  so.    Is  there  a  reason  for  supposing  that 
most  people  don't.    Nevertheless  sex  is  absolutely  neces- 
sary as  though  the  electron  diffraction  camera  plate  up 
on  the  top  surface  of 

9.    Fifteen  years  before  the  mast,  he  ever  had  eaten.  Try 

it  and  see,    I  believe  that  whatever  arises  a  fund  has 

been  accumulated  sufficiently  in  the  near  future  holds 

m«  ™™  *  *      ■  •        •  ■  ... 

many  surprises.    No  man  can  judge  his  actions  by  his  wife 
Susie . 

10.  I  forget  whether  he  went  on  and  on.    Finally  he  stipulated 
that  this  must  stop  immediately  after  this.    The  last  time 
I  saw  him  when  she  lived.    It  "happened  one  frosty  look  of 
trees  waving  gracefully  against  the  wall.    You  never  can 

11.  When  I  bought  my  wife  a  long  time  ago.    I  knew  that  it 
wasn't  faster  when  he  didn't  eat  or  drink  a  toast  to 
John  Doe,  otherwise  known  as  McMillan's  theorem. 
Whatever  the  nature  of  Christ's  teachings.    Go  far  into 

12.  McMillan's  Theorem 

McMillan's  theorem  states  that  whenever  electrons  diffuse 
in  vacua.    Conversely  impurities  of  a  cathode.    No  sub- 
stitution of  variables  in  the  equation  relating  these 
quantities.    Functions  relating  hypergeometric  series 
with  confluent  terms  converging  to  limits  uniformly 
expanding  rationally  to  represent  any  function. 

13 •  House  Cleaning 

First  empty  the  furniture  of  the  master  bedroom  and  bath. 
Toilets  are  to  be  washed  after  polishing  doorknobs  the 
rest  of  the  room.    Washing  windows  semi-annually  is  to  be 
taken  by  small  aids  such  as  husbands  are  prone  to  omit 

-  3  - 

14.  Epiminondas 

Epiminondas  was  one  who  was  powerful  especially  on  land 
and  sea.    He  was  the  leader  of  great  fleet  maneuvers  and 
open  sea  battles  against  Pelopidas  but  had  been  struck  on 
the  head  during  the  second  Punic  war  because  of  the  wreck 
of  an  armored  frigate. 

15.  Salaries 

Money  isn't  everything.    However,  we  need  considerably 
more  incentive  to  produce  efficiently.    On  the  other  hand 
too  little  and  too  late  to  suggest  a  raise  v/ithout  a  reason 
for  remuneration  obviously  less  than  they  need  although 
they  really  are  extremely  meager. 

16.  Murder  Story 

When  I  killed  her  I  stabbed  Claude  between  his  powerful 
jaws  clamped  cruelly  together.    Screaming  loudly  despite 
fatal  consequences  in  the  struggle  for  life  began  ebbing 
as  he  coughed  hallowly  spitting  blood  from  his  ears. 
Burial  seemed  unnecessary  since  further  division  was 

The  sources  are:     3,  from  "Ulysses"  by  James  Joyce, 
page  748;  7  and  14  are  the  conversation  and  writings  of  two 
schizophrenic  patients  (quoted  from  Bleuler,  "A  Textbook  of 
Psychiatry").    All  others  constructed  by  statistical  means. 

„_C, ..-£,.  -SHANNON  

"June  11,  1948 

The  Department  of  Defense 
Washington  25,  D.  C. 

Prepared  by 





C.  E.  Shannon 
Bell  Telephone  Laboratories 
Murray  Hill,  N.  T. 

1.  Introduction. 

A  general  communication  system  is  shown  in  Figure  3.  An  information  source 
produces  a  message.  This  is  encoded  in  a  transmitter  to  produce  a  signal  suitable  for 
transmission  over  the  channel.  During  transmission  the  signal  may  be  perturbed  by 
noise.  The  perturbed  signal  is  decoded  or  demodulated  at  the  receiver  to  recover,  as 
well  as  possible,  the  original  message. 

The  situation  is  roughly  analogous  to  a  transportation  system  for  transporting  physical 
goods  from  one  point  to  another.  We  can  imagine,  for  example,  a  lumber  mill  producing 
lumber  at  an  average  rate  of  R  cubic  feet  per  second  and  a  conveyor  system  capable  of 
transporting  C  cubic  feet  per  second.  If  R  is  greater  than  C  the  full  output  of  the  mill 
cannot  possibly  be  carried  on  the  conveyor.  On  the  other  hand,  if  R  is  less  than  or  equal 
to  C  it  may  or  may  not  be  possible,  depending  on  whether  the  lumber  can  be  efficiently 
packed  in  the  available  space  of  the  conveyer.  However,  if  we  allow  ourselves  to  saw 
the  lumber  up  into  suitable  sizes  and  shapes  we  can  always  approach  100  per  cent  effi- 
ciency in  packing.  In  this  case  we  must,  of  course,  supply  a  carpenter  shop  at  the  other 
end  of  the  conveyor  to  reassemble  the  lumber  in  its  original  form  before  passing  it  on 

If  the  analogy  is  sound  we  might  hope  to  define  two  parameters  R  and  C  associated 
with  an  information  source  and  a  channel,  respectively.  R  should  measure,  in  some 
sense,  how  much  information  is  produced  per  second  by  the  source,  and  C  the  capacity 
of  the  channel  when  used  in  the  most  efficient  manner  for  transmitting  information.  We 
would  expect  then  that  if  R  ^  C  the  full  output  of  the  source  cannot  be  transmitted  satis- 
factorily. If  R  ^  C  it  should  be  possible  to  transmit  the  output  of  the  source  by  proper 
encoding  and  decoding  at  transmitter  and  receiver.  It  turns  out  that  it  is  possible  to 
define  quantities  R  and  C  which  measure  these  information  rates  and  capacities  and 
satisfy  the  desired  relationships.  We  will  attempt  to  show  how  this  can  be  done  without, 
however,  giving  mathematical  proofs  of  the  results.1 

2.  The  Information  Source. 

The  first  problem  is  that  of  clarifying  the  nature  of  "information"  and  finding  a 
measure  of  the  rate  of  production  for  an  information  source. 

Information  involves  basically  the  concept  of  "choice."  An  information  source 
chooses  one  particular  message  from  a  set  of  possible  messages.  If  there  were  only 

!For  mathematical  details,  see  Shannon,  C.E.,  "A  Mathematical  Theory  of  Commu- 
nication," Bell  System  Technical  Journal.  July  and  October,  1948.  See  also  Shannon, C .E . , 
"Communication  in  the  Presence  of  Noise,"  Proceedings  of  the  I.R.E.  (Forthcoming). 

to  the  consumer. 


one  possible  message  there  would  be  no  communication  problem.  The  amount  of  informa- 
tion produced  by  a  source  must  evidently  be  related  to  the  range  of  choice  available. 

The  simplest  possible  choice  is  a  choice  from  two  equally  likely  possibilities,  say 
0  or  1.  We  shall  call  the  corresponding  unit  of  information  a  binary  digit  or  "bit."  A 
relay  or  flip-flop  circuit  has  two  possible  states  and  is  capable  of  storing  one  bit  of 

A  device  which  chooses  at  random  from  0  or  1  making  one  choice  each  second  is 
considered  to  be  producing  information  at  rate  R  of  one  bit  per  second.  Such  a  source 
produces  a  "message"  which  is  a  random  sequence  of  O's  and  l's. 

A  choice  from  say. 32  equally  likely  possibilities  can  be  considered  as  a  series  of  five 
choices,  each  from  two  equally  likely  possibilities,  and,  therefore,  should  correspond  to 
five  bits.  More  generally,  a  choice  from  n  equally  likely  possibilities  represent  logP 
n  bits.  £ 

Suppose  now  that  the  various  possible  choices  have  different  probabilities  of  occur- 
rence, say  pi,  p2,       pn.  How  much  information  is  produced  when  a  choice  is  made  under 
these  circumstances?  One  feels  intuitively  that  less  "choice"  is  involved  in  a  device 
which  chooses  between  0  and  1  with  probabilities  .01  and  .99  than  in  one  which  chooses 
with  equal  probabilities.  In  the  former  case  the  result  is  almost  sure  to  be  1. 

The  following  example  shows  that  by  proper  encoding  an  average  compression  can  be 
obtained  by  using  the  probabilities  pi,  P2,       pn.  Suppose  there  are  four  possible  choices 
A,  B,  C,  D  with  probabilities  pA  =  1/2,  pB  =  1/4,  pc  =  1/8,  pD  =  1/8.  If  we  use  a  simple 
direct  code  into  binary  digits: 

A  =  00       B  =  01       C  =  10       D  =  11, 

we  use  two  binary  digits  per  letter.  On  the  other  hand,  using  the  following  code  where 
more  probable  letters  are  given  short  codes  and  less  probable  letters  longer  codes,  we 
obtain  an  average  saving 

A=0       B  =  10       C  =  110       D  -  111. 

This  is  a  reversible  code;  the  original  text  can  be  recovered  from  the  encoded  sequences 
as  is  readily  verified.  With  this  code  we  need,  on  the  average,  only 

(1/2  x  1  +  1/4  x  2  +  1/8  x  3  +  1/8  x  3)  =  1  3/4 

binary  digits  per  letter.  We  may  say  then  that  a  choice  with  probabilities  1/2,  1/4,  1/8, 
1/8  corresponds  to  1  3/4  bits  of  information.  If  an  information  source  were  producing 
a  sequence  of  the  letters  A,  B,  C,  D  with  these  probabilities  we  could  encode  it  into  a 
sequence  of  binary  digits  in  which  1  3/4  binary  digits  are  used  on  the  average  for  e?.ch 
letter  of  message. 

A  general  analysis  of  the  situation  shows  that  if  the  letters  are  chosen  with  probabili- 
ties plf  p2,        pn  then  it  is  possible  to  encode  into  binary  digits  using 

H  =  -  2,  Pi  log2  Pi 

binary  digits  per  letter  of  message  on  the  average,  and  there  is  no  method  of  reversible 
encoding  using  less.  This  H  then  is  the  equivalent  number  of  bits  per  letter,  and,  if  the 
source  produces  n  letters  per  second,  R  =  nH  is  the  rate  of  production  in  bits  per  second. 


In  the  case  of  English  text  the  statistical  structure  is  more  involved.  There  are  the 
mricms  letter  probabilities  Pi,  but,  also,  there  are  statistical  influences  between  nearby 
totters    For  example,  the  letter  T  is  more  often  followed  by  H  than  by  any  other  letter 
a  Qis  almost  invariably  followed  by  U,  etc.  In  such  cases  there  is  a  more  general  formula 
i  for  calculating  the  equivalent  number  of  bits  per  letter  of  message.  Let  pU,  3»  ■  s)oe 
i  Ibe  probability  in  the  language  of  the  sequence  of  letters  i,  j  s.  Then  we  define  G„ 





p(i,  j,       s)  log2  p(i,  i,  ....  s) 

where  the  sum  is  over-all  sequences  of  letters  which  are  just  n  letters  long  J^h  which 

ouences  Gi.  Go  Gn>  ...  represents  a  series  of  approximations  to  the  desired  H  which 

takes  into  account  mofe  and  more  of  the  statistical  structure  as  we  proceed  along  the 
sequence.  The  information  per  letter  of  message  can  be  defined  by  the  limiting  value  of 
the  G's. 

H  =  Lim  G 

— »  oo 


It  can  be  shown  that  H  has  the  desired  properties;  namely,  we  can  encode  the  messages 
from  the  source  into  binary  digits  using  H  binary  digits  per  letter  on  the  average,  and  no 
method  of  encoding  uses  less. 

For  the  English  language  H  has  been  estimated  at  roughly  2  bits  per  letter,  taking 
account  only  of  the  statistical  structure  out  to  about  6  or  8  letters. 

If  the  messages  produced  by  the  information  source  are  continuous  functions  of  time 
ta  in  speech  or  television  transmission,  the  situation  is  much  more  involved  and  we  will 
not  discuss  it  in  detail.  It  is  still  possible  to  assign  a  rate  of  production  of  information 
In  bits  per  second  to  such  a  source,  but  the  rate  now  depends  on  other  considerations. 
With  continuous  functions  as  messages,  exact  reproduction  is  not  generally  required  and 
the  rate  R  depends  on  the  amount  and  nature  of  the  discrepancy  which  can  be  tolerated 
between  the  original  and  recovered  messages.  The  tolerable  discrepancy  in  turn  is 
determined  by  the  final  destination  of  the  messages.  With  speech,  for  example,  the  toler- 
able errors  depend  on  the  structure  of  the  human  ear  and  brain. 

Although  the  mathematical  problems  involved  in  defining  the  rate  for  a  continuous 
source  have  been  completely  solved,  it  is  in  practical  cases  very  difficult  to  estimate  R. 
The  following  calculation  may  be  of  some  interest,  however.  Suppose  we  are  interested 
only  in  transmitting  English  speech  (no  music  or  other  sounds),  and  the  quality  require- 
ments on  reproduction  are  only  that  it  be  intelligible  as  to  meaning.  Personal  accents, 
Inflections,  etc.,  can  be  lost  in  the  process  of  transmission.  In  such  a  case  we  could  at 
least  in  principle,  transmit  by  the  following  scheme.  A  device  is  constructed  at  the  trans- 
mitter which  prints  the  English  text  corresponding  to  the  spoken  words    These  can  be  ^ 
translated  into  binary  digits  in  the  ratio  of  about  two  binary  digits  per  letter,  or  ^x4.D  -  v 
per  word.  Taking  100  words  per  minute  as  a  reasonable  talking  speed  we  obtain  900  bits 
per  minute  or  15  bits  per  second  as  an  estimate  of  the  rate  for  English  speech  when  in- 
telligibility is  the  only  fidelity  requirement. 

3.  The  Capacity  of  a  Channel. 

We  now  consider  the  problem  of  defining  the  capacity  C  of  a  channel  for  transmitting 
Information.  Since  we  have  measured  the  rate  of  production  for  an  information  source  in 


mitted  over  a  given  channel? 

in  some  cases  the  answer  Is  simple.  With  a .  tele «»J%*£Z ^second, 

can  send  5n  bits  per  second. 

Suppose  now  that  the  channel  is  defined  £ fc^j. JJ- ^  Vyclef pTrse^nfwide . 
tions  of  time  f(t)  which  lie  within  a  cer^»^  a  series  of 

It  is  known  that  a  function  of  thi^type  can  be  J£j  say  that  such  a  function 

equally  spaced  sampling  points^  seconds  apart    Thus  we  may  say 
has  2W  degrees  of  freedom,  or  dimensions,  per  second. 

If  there  is  no  noise  whatever  » 

Even  when  there  is  noise,  if  we  place  no ^tjon s  ^JgPSSS!SSU 
capacity  will  be  infinite  for  we  m **£W2?£tof  e«    p  transmitter 
number  of  different  amplitude  levels  .^^nw^etevres  The  capacity  depends,  of 


The  shiest  type  o,  noise  is  white  V^tt'S^K''' 
distribution  of  ampUt^s  is  Ga**ta, and  to  a  eetrnmr s  ilat q      7  ^  tf 

into  a  unit  resistance. 

The  simplest  limitation  on  transmitter  power  is  ^^^S^£%M 
SLr«TL£T£K  SLrto/eTarametLs  W,  P,  and  N, 
the  capacity  C  can  be  calculated.  It  turns  out  to  be 

C  =  W  log2    E-^Ji  (bits  per  second). 

P  +  N 

different  amplitudes  at  each  sample  point.  In  a  time  T  there  will  be  2TW  independent 
samples.  Thus,  there  are  approximately 

(  /  P  +  N)  2TW     (p  +  N)TW 
M  "  (V     N    )         =  (    N  ) 

different  signal  functions  of  duration  T  that  can  be  distinguished  from  one  another  in  spite 
of  the  noise.  This  corresponds  to 


log2  M  =  TW  log2  P  ftN 

binary  digits  in  the  time  T  or 

C=W  log2  P^N 

binary  digits  per  second.  This  formula  has  a  much  deeper  and  more  precise  signifi- 
cance than  the  above  argument  would  indicate.  In  fact  it  can  be  shown  that  it  is  possible, 
by  properly  choosing  our  signal  functions,  to  transmit  W  log2     fo^  binary  digits  per 
second  with  as  small  a  frequency  of  errors  as  desired.  It  is  not  possible  to  transmit 
binary  digits  at  any  higher  rate  with  an  arbitrarily  small  frequency  of  errors.  This 
means  that  the  capacity  is  a  sharply  defined  quantity  in  spite  of  the  noise.  These  state- 
ments are  proved  by  two  different  methods. * 

The  formula  for  C  applies  for  all  values  of  P/N.  Even  when  P/N  is  very  small,  the 
average  noise  power  being  much  greater  than  the  average  transmitter  power,  it  is  pos- 
sible to  transmit  binary  digits  at  the  rate  W  log2P     N  with  as  small  a  frequency  of 
errors  as  desired.  In  this  case  log2  (1  +£)  is  approximated  by  -£log2  e  =  1.443  ^ 
and  we  have  approximately 

C  =  1.443 

It  should  be  emphasized  that  it  is  only  possible  to  transmit  at  a  rate  C  over  a  channel 
by  properly  encoding  the  information.  In  general,  the  rate  C  is  only  approached  as  a  limit 
by  using  more  and  more  complex  encoding  and  longer  and  longer  delays  at  both  trans- 
mitter and  receiver.  In  the  white  noise  case  the  best  encoding  is  such  that  the  transmitted 
signals  themselves  have  the  structure  of  a  white  noise  with  power  P.  The  difficulty  with 
the  approximate  argument  given  for  that  case,  and  the  reason  it  does  not  give  a  sharply 
defined  capacity,  is  that  the  selection  of  signals  is  not  optional.  The  distribution  of  ampli- 
tudes is  not  Gaussian  as  it  should  be. 

4.  Comparison  of  Ideal  and  Practical  Systems.  * 

In  Figure  4  the  curve  is  the  function 

%  =  log  (1  +f ) 

plotted  against  P/N  measured  in  db.  It  represents,  therefore,  the  channel  capacity  per 
unit  of  band  with  white  noise.  The  circle  and  points  correspond  to  PCM  and  PPM  systems 
used  to  send  a  sequence  of  binary  digits  and  adjusted  to  give  about  one  error  in  1CP  binary 
digits.  In  the  PCM  case  the  number  adjacent  to  a  point  represents  the  number  of  ampli- 
tude levels  -  3  for  example  is  a  ternary  PCM  system.  In  all  cases  positive  and  negative 
amplitudes  are  used.  The  PPM  systems  are  quantized  with  a  discrete  set  of  possible 
positions  for  the  pulse,  the  spacing  is  ^j,  and  the  number  adjacent  to  a  point  is  the  num- 
ber of  possible  positions  for  a  pulse. 

The  series  of  points  follows  a  curve  of  the  same  shape  as  the  ideal  but  displaced 
horizontally  about  8  db.  This  means  that  with  more  involved  encoding  or  modulation  sys- 
tems a  gain  of  8  db.  in  power  could  be  achieved  over  the  system  indicated. 

See  Shannon,  C.  E.,  "Mathematical  Theory  of  Communication"  and  "Communication 
in  the  Presence  of  Noise." 


Of  course,  as  one  attempts  to  approach  the  ideal,  the  transmitter  and  receiver  re- 
quired become  more  complicated  and  the  delays  increase.  For  these  reasons  there  will 
be  some  point  where  an  economic  balance  is  established  between  the  various  factors 
It  may  well  be,  however,  that  even  at  the  present  time  more  complex  systems  would  be 

A  curious  fact  illustrating  the  general  misanthropic  behaviour  of  Nature  is  that  at 
both  extremes  of  P/N  (when  we  are  well  outside  the  practic*  ^/^pcMlotaS 
in  Figure  4  approach  more  cjosely  the  ideal  curve.  At  very  large  P/N  *e,f  £M  pomts 
Approach  to  within  10  log10#  =  4.5  db.  of  the  ideal  while  with  very  small  P/N  the  PPM 
points  approach  to  within  3  db.  The  relation 

C  =  W  log  (1 

can  be  regarded  as  an  exchange  relation  between  the  parameters  W  and  P/N.  Keeping  the 
ch^el  cgacity  fixed  we  can'decrease  the  bandwidth  W  provided  we  ^ease  P/N  «£- 
ficiently.  Conversely,  an  increase  in  band  allows  a  lower  signal-to-noise  ratio  in  the 
channel    The  required  P/N  in  db.  is  shown  in  Figure  5  as  a  function  of  the  band  W.  It  is 
assumed  here  that  as  we  increase  W,  N  increases  proportionally: 

N  =  W  N0 

where  N0  is  the  noise  power  per  cycle  of  band.  It  will  be  noticed  that  if  P/N  is  large  a 
reduction  of  band  is  very  expensive  in  power.  Halving  the  band  roughly  doubles  the 
signal-to-noise  ratio  in  db.  that  is  required. 

The  channel  capacity  C  can  be  calculated  in  many  other  cases.  A  general  result  that 
applies  in  any  situation  where  the  average  transmitter  power  is  limited  to  P  is  that  the 
channel  capacity  is  bounded  by: 

WlogL^l^C  £W  log^ 

where  N,  is  a  parameter  called  the  "entropy  power"  of  the  noise.  It  is  defined  as  the 
power  ina  white  noise  having  the  same  entropy  as  the  actual  noise.  N  is,  as  before,  the 
average  noise  power. 




Nyquist,  H. 

"Certain  Factors  Affecting  Telegraph  Speed,' 
Bell  System  Technical  Journal,  April  1924, 

Hartley,  R.  V.  L. 

Shannon,  C.  E. 

Toller,  W.  G. 
Wiener,  N. 

Bailey,  R.  D.,  and 
Singleton,  H.  E. 

p.  324. 

"Certain  Topics  in  Telegraph  Transmission 
Theory,"  A.I.E.E.  Transcripts,  Vol.47, 
April  1928,  p.  617. 

"Transmission  of  Information,"  Bell  System 
Technical  Journal,  July  1928,  p.  535. 

"A  Mathematical  Theory  of  Communication," 
Bell  System  Technical  Journal,  July, 
October,  1948. 

"Communication  in  the  Presence  of  Noise," 
Proceedings  of  the  I.R.E.  (Forthcoming). 

Sc.D.  Thesis,  Department  of  Electrical 
Engineering,  Massachusetts  Institute  of 
Technology,  1948. 

The  Interpolation,  Extrapolation  and  Smoothing 
of  Stationary  Time  Series,  NDRC  Report 
(Forthcoming  as  a  book  to  be  published  by 
John  Wiley  and  Sons,  Inc.,  New  York). 

Cybernetics.  John  Wiley  and  Sons,  Inc., 
New  York,  1948. 

"Reducing Transmission  Bandwidth,"  Electronics. 
August  1948,  p.  107. 



Note  on  Certain  Transcendental  Numbers 
Claude  E.  Shannon 

This  note  calls  attention  to  a  certain  class  of 
numbers  that  are  easily  shown  to  be  transcendental  but  seem 
to  have  escaped  previous  notice.     A  typical  example  is  the 

-2  * 

X  =  2  * 

or  more  precisely  X  =  ^Lim^Xn,  ^n+l  =  2      *  ^0  =  2*    ^  is  ^ 
easily  seen  that  X  exists  and  satisfies  the  equation  X  =  2"  . 
It  is  known  from  a  conjecture  of  Hilbert ,  proved  by  Gelfond 

and  by  Schneider,  that  ax  is  transcendental  if  a  /  0,  1  is 
algebraic  and  x  is  an  algebraic  irrational.    Nov;  X  is  clearly 
not  rational,  and  if  we  suppose  it  an  algebraic  irrational, 
it  must  then  be  transcendental,  a  contradiction.    Hence  it  is 

More  generally  let  f  be  a  function  such  that  if 
x  is  algebraic  and  does  not  belong  to  a  set  S,  then  f(x)  is 
transcendental.    Let  g1  and  g2  be  algebraic  functions  and 

such  that  x  f  g1fg2x,  xeS.     Then  the  solutions  of 

are  transcendental  by  a  similar  argument ,  using  the  fact  that 
g£  is  algebraic.  If  the  sequence  Xn  =  (g1fg2)1X0  approaches 
a  limit  X  it  must  be  transcendental.  Some  functions  known  to 
have  the  property  required  for  f  are  sin  x,  ex  and  JQ(x) ,  the 
exceptional  set  S  consisting  of  the  number  0. 


October  27,  1948 

\    '.  A  CASE  OF  EFTIC1EHT  CGDI83  FOl  A  BOIST  CHAH38L 

Consider  a  di  aerate  channel  with  two  poeeiMe  symbols 
0  and  1*    Hoise  it  aeeuaec  to  affect  successive  cyrbolB  inde- 
pendently **nd  in  such  6  wty  that  t  o  probability  of  a  syjabol 
bainf,  inter,  reted  correctly  at  the  receiver  ie  j>  »  *  g  1  wnlealg 

the  probability  of  incorrect  interpretation  io  q  - 

^  2 

ca^city  of  such  &  channel  is 

-  e2 

Ve  e©»us»  e  very  soall  and  epproximte  log  (1  ♦  c)  by  z 


*  e2  (natural  units) 
In  bits  .or  ayebel,  the  capacity  1st 

C  -      log*,  a 

A  vary  eiaple  coda  can  be  oonetruct<*J  for  this  eyatea 
to  aond  a  Doquence  of  random  binary  dibits  at  nearly  the  rata  C 
with  a  quite  snail  frequency  of  errors |  In  other  wards  a  code 
Wuich  la  not  far  fron  the  ideal*  The  code  is  merely  to  repeat 
each  binary  digit  in  the  oeeeage  a  large  number  n  of  tiasee.  At 
the  roceiver,  a  group  of  n  is  received,  end  the  rajority  report 
la  taken  aa  the  original  nessags  eynbol. 

If  the  m&mrp  eynhol  is  0  then  a  0fs  are  trans-itted. 
At  tilt  receiver  the  n  received  eynbols  will  be  a  -istur©  of 
0*8  und  l»a  the  number  of  0*s  present  will  be  distributed  ac- 
cording to  a  binonial  distribution  with  p  •  I  *,  *  and  q  ■ 

For  large  n  the  binonial  distribution  is  approximately  nornal 
(and  this  approximation  is  especially  ^ood  when  p  5  s  close  to 

i).  The  exacted  nc->*r  of  O'c  is  p  n,  and  the  standard  devia- 
tion is; 

An  error  occu*e  when  the  number  of  rocoivod  O'o  ie  lose  than 
l.e*  when  the  actual  number  of  cores  is  p  n  -  §  av*iy  froo 
t;ie  ejected  nunber.    In  terras  €>f  r  this  iat 

*■       -    ^  — ^  standard  deviations. 

Hence  the  frequency  of  errors  is  given  by  the  area  of  a  noma! 
curve  with  otandard  deviation  equal  to  unity  fron  a  out  to  m. 

To  obtain  a  frequency  of  errors  10*3,  say,  we  mist 
have  a  ■  1*5 



and  the  rate  is  -JL.  as  coopered  with  the  rate  1«.&5  the 


ideal  (with  essentially  zero  froquency  of  errors). 

Hovenber  IS, 

c.  s.  svjjman 

December  6,  1943 

Note  on  Reversing  A  Discrete  Markhoff  Process 

In  "A  Mathematical  Theory  of  Communication"  a 
language  was  represented  by  a  discrete  Markhoff  process  with 
a  finite  number  of  possible  states.    Such  a  stochastic  process 
can  be  represented  schematically  by  means  of  an  oriented  linear 
graph  as  in  Fig.  1 

Consider  the  question  of  generating  the  same  language 
in  reverse;  for  example,  English  but  read  backwards.    Can  we 
always  invert  a  finite  state  Markhoff  process  and  obtain  a 
finite  state  Markhoff  process?    The  answer  is  "yes"  and  further- 
more the  corresponding  linear  graph  has  the  same  topology,  but 
with  reversed  kwwl  orientation  on  all  branches.    If  the 
original  process  has,!  probabilities /(probability  when  in  state 
i  of  going  to  state  j),  then  the  reverse  process  has  the  same 
state  probabilities  and  the  transition  probabilities  given  by: 

<yU)  -  g  Hii) 


This  is  true  since  this  qj(i)  is  merely  the  a  posteriori  probability 
for  the  original  process  that  when  in  state  j  the  preceding  state 
was  state  i.    The  inverse  of  Fig.  1  is  shown  in  Fig.  2. 

It  is  interesting  to  show  directly  that  the  entropy 
H£  of  the  reverse  process  is  equal  to  the  entrop4jHp  of  the 
forward  process.    Of  course,  this  must  be  true  a  posteriori  from 
the  general  properties  of  entropy.    V/e  have 

Pjfi'jU)  -  PifKj) 

9  ? 

-  2  - 

Hence  t 

ZP^U)  log  Pjqj(i)  -  ZPifi(j)  log  Pl^i(j) 


2Pjqj(i)  log  qj(r)  ♦  2Pjqj(i)  log  ?± 

-  ZtjfiU)  log  ♦  ZPij^itj)  log  Pi 



-HR  +  ZPj  log  Pj  —Hp  ♦  ZPi  log  Pi 



Outline  of  Talk 
American  Statistical  Society,  December  28,  1949 


C.  S.  Shannon 

Bell  Telephone  Laboratories,  Inc.,  Murray  Hill,  R.  J. 

1,  Information  Produced  by  a  Stochastic  Process 

In  communication  engineering ,  we  are  interested  in 
transmitting  messages  from  one  point  to  another.    The  messages 
generally  consist  of  a  sequence  of  individual  symbols,  such  as 
the  letters  of  printed  English,  which  are  governed  by  proba- 
bilities.   Thus,  in  English,  there  are  the  various  letter  fre- 
quencies, digram  frequencies,  etc.    The  "meaning*  of  the 
message  (if  any)  is  irrelevant  to  the  engineering  problem. 
Abstractly,  then,  we  may  consider  a  message  to  be  a  sequence  of 
meaningless  symbols  produced  by  a  suitable  Stochastic  process. 
Communication  systems  must  be  designed  to  handle  the  ensemble 
of  possible  messages;  the  particular  one  which  will  actually 
occur  is  not  known  when  the  system  is  constructed.    The  source 
producing  messages  is  assumed  to  have  only  a  finite  number  of 
possible  internal  states. 

2.  Entropy  as  a  Measure  of  -Information 

A  suitable  measure  of  the  amount  of  Information  pro- 
duced  by  a  discrete  Stochastic  process  is  given  by  the  entropy 
H,  where 

Ha-   Um  hi  p^,  lo*2  **xl»  ••"» 

■  ™e>  ^S»  sw 

-  2  - 

in  which  x^,  •       Xjj  is  &  sequence  of  N  symbols  produced  by 

the  process,  p(x^f  •*#,  x^)  is  the  probability  of  this  ssquence, 

and  the  sum  is  over  all  sequences  of  this  length. 

The  significance  of  the  quantity  H  is  that  it  is  pos- 
sible to  translate  messages  from  a  source  with  entropy  H  into  a 
sequence  of  binary  digits  (0  or  1)  using,  on  the  average,  H  +  c 
binary  digits  per  letter  of  the  original  message  with  any 
positive  c.    It  is  not  possible  to  translate  so  that  fewer  are 
used*    Thus.  B  measures,  in  a  sense,  the  equivalent  number  of 
binary  digits  per  letter  of  message.    It  can  be  shown  that  H 
also  determines  the  amount  ef  channel  capacity  required  for 
transmission  of  the  original  messages. 

entropy,  Hx(y) ,  of  one  source  relative  to  another.  This 
measures  in  a  sense  the  uncertainty  per  letter  of  the  y  sequence 
when  the  x  sequence  is  known,  or  ths  amount  of  additional  infor- 
mation in  the  y  sequence  over  that  available  in  the  x  sequence. 
Hx(y)  can  be  defined  as  follows: 

Hjty)  «  H(x,  y)  -  H(x) 

where  H(x,  y)  is  the  entropy  of  the  sequence  whose  elements  are 

ths  ordered  pairs  (x,  y) • 

3.    The  Nature  of  Information 

While  the  entropy  H  measures  the  amount  of  information 
produced  by  a  Stochastic  process,  it  does  not  define  the  infor- 
mation itself.    Thus  two  entirely  difference  sources  might 

produce  information  at  the  same  rata  (same  H)  but  certainly  they 
are  not  producing  the  same  information.    If  we  translate  the 
output  of  a  particular  source  into  a  different  "language"  by  a 
reversible  operation,  the  translation  may  be  said  to  have  the 
same  information  as  the  original.    Thus  we  are  led  to  consider 
the  information  of  a  Stochastic  process  as  that  which  is  common 
to  all  translations  obtained  from  the  given  process  by  members 
of  the  group  0  of  reversible  translations,  or,  alternatively,  as 
the  equivalence  class  of  all  processes  obtains*  from  the  given 
one  by  such  translations.    To  avoid  certain  paradoxical  situa- 
tions, involving  infinite  internal  storage  in  the  transducer 
doing  the  translating,  it  is  desirable  to  first  limit  the  group 
Q  to  translations  possible  in  transducers  having  a  finite 
number  of  possible  internal  states.    The  information  associated 
with  a  process  may  bs  denoted  by  a  single  letter,  say  X.  Thus 
X  =  T  means  that  T  can  be  obtained  by  a  translation  of  I,  and 
conversely.    It  is  possible  to  set  up  a  metric  satisfying  the 
usual  postulates  as  follows: 

*  2H(x,  y)  -  *(x)  -  H(y)  . 

Vith  this  metric  It  Is  possible  to  define  limiting  sequences  of 
elements,  each  of  which  is  an  information.   Thus  s  Cauchy 
sequence,  XjL>  Xj,  i«  defined  by  requiring  that 

Lim   ptX,,  In)  «  0  . 

The  Introduction  of  these  sequences  as  new  elements  (analogous 
to  irrational  numb ere)  completes  the  space  in  a  satisfactory 
way  and  enables  one  to  simplify  the  statement  of  various  results. 
k.    The  Information  Lattice 

A  relation  of  inclusion,  x  >  y,  between  two  infor- 
mation elements  x  and  y  can  be  defined  by 

x  >  7  *  Hx(y)  ■  0  . 

This  essentially  requires  that  y  can  be  obtained  by  a  suitable 
finite  state  operation  (or  limit  of  such  operations)  on  x.  If 
x  >  y  we  call  y  an  abstraction  of  x.    If  x  >  y,  y  >  s,  then 
x  >  s.    If  x  >  y,  then  H(x)  >  H(y).    Also  x  >  y  means  x  >  y, 
x  f  y.    The  information  element,  one  of  whose  translations  is 
the  process  which  always  produces  the  same  symbol,  is  the  0 
element,  and  x  >  0  for  any  x. 

The  sum  of  two  Information  elements,  s  m  x  +  y,  is  the 
process  which  produces  the  ordered  pairs  (x^,  yn).    We  have 

and  there  is  no  u  <  s  with  the  properties;  a  is  the  least  upper 
bound  of  x  and  y. 

The  product  s  »  xy  is  defined  as  the  largest  t  such 
that  •  >  x,  s  >  yj  that  is,  there  is  no  u  >  s  haying  both  x 
and  y  as  abstractions.    The  product  is  unique. 

With  these  definition*  information  element e  fona  a 
metric  lattice.    The  lattice  it  not  distributive,  nor  even 
modular.    A  non-distributive  example  1b  x,  y  independent 
sequences  of  binary  digits,  with  z  the  sequence  obtained  by- 
mod  2  addition  of  corresponding  symbols  in  x  and  y.  Then 

sy  +  2x  =  0  +  0  =  0 
i(x  +  y)  ■  i  /  0  . 

The  lattices  are  relatively  complimented.  There 
exists  for  x  <  y  a  ■  with 

s  +  x  =  y 

sx  =*  0  . 

The  element  s  is  not,  in  general,  unique. 
5.    The  Delay  Free  Group  0^ 

The  definition  of  equality  for  information  based  on 
the  group  0  allows  x  =  y  when  y  is,  for  example,  s  delayed 
version  of  x$  yB  ■  x^.    In  some  situations,  when  one  must 
act  on  information  at  a  certain  time,  a  delay  is  not  permis- 
sible.   In  such  a  case  we  may  consider  the  more  restricted 
group      of  instantaneously  reversible  translations.    One  may 
define  inclusion,  sum,  product,  etc.,  in  an  analogous  way,  and 
this  also  leads  to  a  lattice  but  of  mush  greater  complexity 
and  with  many  different  Invariants. 

Proof  of  an  Integration  Formula 

C. E.  Shannon 

The  integral 

0     sin2  x  2  sin^  or 

has  arisen  in  an  acoustical  problem.  It  has  been  evaluated  for  N  =  1,  2,  3,  4  as 
equal  to 

gN  (a)  =  a  N  +  2  i— r-1  sin  2  i  a  (2) 
(-1  ' 

by  R.  C.  Jones,  and  he  has  conjectured  that  fN  =  gN  for  all  a,  Af.   A  general 
proof  follows. 

From  (1)  we  have 

.  ,  .  ,  „,  .  1  f  °  cos  lNx-2  cos  2(W  -  1)*  +  cos  2W  -  2)  x  . 
A2*,  -h  ~  Tfn-1  +  In -2  =  ~  y  J0   L^T^   ^ 


d    a2     ,  ,  ,         cos  2Ate  -  2  cos  2flV  -  l)a  +  cos2(A^  -  2)a 

—  AW»(«)  y^   (3) 

Also  from  (2) 

Aiv  =  a  +  2 

(-1  ' 

2  _  sin  2(AT  -  1)  a 

AN.  AT  ftV(a)  N~^\ 

tit.N  gsw  =  2  cos  2(N  -  1)  a  (4) 
The  equality  of  (3)  and  (4)  can  be  established  by  noting  that  the  numerator  of  (3), 



cos  2  N  a  -  2  cos  2(N  -  l)a  +  cos  2(N  -  2)a 
Re  [eJV,a  -  2eJ2{N~l)a  +  e/W-2)aj 


^-i)a[c,2a_2  +  c-,2a]J 
=  Re  |«W-D«  (2;-)2 


-  -  Re  |4  sin2  a  ^W-1)*)  =  -  4  sin2  a  cos  2(N  -  l)a 

but    A2      (0)  =  A2  fN  (0)  =  0,  so  that 
^2n,n8nM  =  Ai^/jvCot) 
also  it  has  been  verified  that 

Si  (°0  =  /i(a) 
£2  (°0  =  /2(a) 

Hence  it  follows  in  general  that 

A  &leit*l  ******  »t  fr^Mlttltac  lafonttttoa 

2t  Is  p*«*lM*  fey  ¥fe*l*u#  of  eodulaUoe  to  Xmr 

pjroto  oao  tutpmt  of  e  oystos  for  *jr&»o*iUia£  Iafor»*Uoa  at  too 
OXpoooo  Of  otters.    Mi«  T*risro«  car.atmeo  *tic*  mj  se  exoasuigfg 

i,     uaitty  of  rocoivo*  oigoel,  ftiiica  ess  bo  rou^iJ/ 

SMMMHtrwS  la  *««HM»  t>/  S&0  tO  £13 1 00 



£•     TtttiiBZi  2 1%9?  yc**r»p. 

S.    tlm  of  troossUooi£A» 

ft.     BoiOO  4*4  t&O  OJKfeOtt* 

aoooroX  tteojr*  of  bow  tfeooo  voriofcioo  oro  roiotoO  «*4  tSm 

liivwi»«d  oafi  will  oe  &«volopo4  la  a  forthoofclas  soaorwifim. 
Bo»oo«r  «poofcitt&  x-.Ht*M/  *&4  oa&or  «  sus&ber  of  oojJUioay  0001*09-  - 

f ol2ooXm«  e^ufitioos 

a  ■  f  if  y  10  {*) 

3  *  «  aooouro  Of  4ii*t0rtiGji  at  tftt  **««tv*r 

t  *         *f  trooonlooiaa 

*  •  bsaa  iriiia  ©f  tro-ts&ittor 

ST  *  aciso  j-«w«T  £*30|t?fl   ti:«t  1»  t&O  O&iOO  ?OW*r 
p#r  *Ait  tw?.i4  oil  Hi,  *>*«&r*e»  tolas 

alalia  *s  flfci  is  toe  rofii«»  u^At-?  *fi>.:mlaar*tioa 

yjUUi  ftmi  tautt  koojMtag  rooolToft  <|ooli*jr  istojr&ottt 
oo  aor  0100010  t,      F  «M  £  1a  r*rio*»o         o*>  loo*  ft*  oo 
kooo  tl*o  gpam  ©f  t&«  foooHoo* 

r  1 21 

«fcoro  £«*  an£  %  or«  too  WUl  triuioatttor  tatar  ao4  acl«o 
QJQjSgf,  **ria«  too  traaftftlsalast  tiao.    ^»  fcr  •sa«pl«  t/jr  to- 
oroosiog  btutf  wUto  oo  ooo  eoorofioo  tra&o&ittor  -  tU« 

m&a&m&t  10  la  «a«  ooaoo  vor*  foooroolo  »iae*  It  lit  «  log-«  *moj  o**lag  aulto  or  boaA  oJUitfc  AlvMoo  t&o  o*or«r 
»jf  a  ft*  tor. 

»ro  two  »*tbfld«  of  fetter Sag  o1&ao1  *»  aaloo  rotlo  «t  too  ox»«ooo 
of  boo*  «i*to.    BoltOor  of  titooo  Jkwovo*  Is  by  oor  msw*  eftUud 
l&  too  ozobooso.   Sfco  $roooal  aoKomotoa  toooriooo  o  sow  ootfaoo 
at  its  t&Uft  oosootlollr  too  aoxtwai  e*oias  of  olgool 

pmm*  io  oofelovoi  for  o  $lm  oo**  wlata  laero*oo*    &U  4coo 
not  «oo£  toot  «t«  ftfotoa  of  troaoaiooieo  lo  •  tooorotioaHf 
Uool  ono  for  tkoro  oro  oororol  otHor  aooo*  of  iss$*miM*  ro- 
ooivoi  qooJLU*  fcooola*  f .  *.  ?  *o&  *  flxoi  -  «**t  tfclo  oro  too 
to  to  yWlt  m  ooarlr  tAool  oireonago  roto  ootooo*  too 

anlM  1m  Oaa^L  fift  Um  of  OOOlloo  fcfa*  YOl&OC 

of  too  lopot  ytoolotlag  fomoUoa  (too  o$oooa  faootloo  la  tolo- 
saoao  oaa  roftle)  ot  o  00300000  of  rofolorXr  ooboo*  oooyllat 

Thus     t«8  +  4~£**l  , 
Oi     *5  --«  4-4-2  +  1 

A  tnaaltttr  for  this  ay* taa  oould      built  1m  the 
following  way.    A  oondenaar  ia  okarged  as  usual  to  tha  eamplad 

roltage.    fill  roltaga  la  read  on  a  comparator  teiaaed  up  to 


half  the  *w<""t    If  the  comparator  glrea  a  poaitlra  Indlcatioa 
am  electronic  switch  la  oloaad  feeding  a  aegatire  pulaa  of  2* 
uuita  oT  charga  late  tha  condenser;  If  not  a  poaitlra  pulaa  of 
2m  unita  is  fad  in.    Tha  oomparator  is  now  switched  to  control 

'  - 

at  now  pulaa  source  whieh  preduaas  pulaaa  of  2n**1  units  and  tha 
prooaaa  is  repeated.    Thus  tha  circuit  f aods  in  positire  or 
nogatlTO  pulaaa  of  decreasing  magnituda  "hunting*  for  a  balance. 
At  oaoh  stags  a  rooordar  remembers  whathor  a  poaitlra  or  negatire 
pulaa  was  used.    Thass  positire  ant  nagatira  recordings  actually 
arc  tha  Binary  roprasantation  of  tha  original  roltaga,  as  ona 
can  soo  »y  roading  tha  shore  table  with  1»  roplaaod  by  0.  Baneo 
tha  raoolror  of  Jig,  4  can  ho  used  without  alteration  in  this 

-  £723 

Creative  Thinking 


Up  to  100%  of  the  amount  of  ideas  produced,  useful  good 
ideas  produced  by  these  signals,  these  are  supposed  to  be  arranged 
in  order  of  increasing  ability.    At  producing  ideas,  we  find  a 
curve  something  like  this.    Consider  the  number  of  curves  produced 
here  -  going  up  to  enormous  height  here, 

A  very  small  percentage  of  the  population  produces  the 
greatest  proportion  of  the  important  ideas.    This  is  akin  to  an 
idea  presented  by  an  English  mathematician,  Turig,  that  the  human 
brain  is  something  like  a  piece  of  uranium.    The  human  brain,  if 
it  is  below  the  critical  lap  and  you  shoot  one  neutron  into  it, 
additional  more  would  be  produced  by  impact.    It  leads  to  an  ex- 
tremely explosive    •  of  the  issue,  increase  the  size  of 
the  uranium.    Turig  says  this  is  something  like  ideas  in  the  human 
brain.    There  are  some  people  if  you  shoot  one  idea  into  the  brain, 
*    you  will  get  a  half  an  idea  out.    There  are  other  people  who  are 
beyond  this  point  at  which  they  produce  two  ideas  for  each  idea 
sent  in.    Those  are  the  people  beyond  the  knee  of  the  curve.  I 
don't  want  to  sound  egotistical  here,  I  don't  think  that  I  am 
beyond  the  knee  of  this  curve  and  I  don't  know  anyone  who  is.  I 
do  know  some  peopie  that  were.    I  think,  for  example,  that  anyone 
will  agree  that  Isaac  Newton  would  be  well  on  the  top  of  this 
curve.    When  you  think  that  at  the  age  of  25  he  had  produced  enough 


science,  physics  and  mathematics  to  make  10  or  20  men  famous  -  he 
produced  binomial  theorem,  differential  and  integral  calculus,  laws 
of  gravitation,  laws  of  motion,  decomposition  of  white  light,  and 
so  on.      Now  what  is  it  that  shoots  one  up  to  this 

-  2  - 

part  of  the  curve?    What  are  the  basic  requirements?    I  think  we 
could  set  down  three  things  that  are  fairly  necessary  for  scien- 
tific research  or  for  any  sort  of  inventing  or  mathematics  or 
physics  or  anything  along  that  line.    I  don't  think  a  person  can 
get  along  without  any  one  of  these  three. 

The  first  one  is  obvious  -  training  and  experience, 
lou  don't  expect  a  lawyer,  however  bright  he  may  be,  to  give  you 
a  new  theory  of  physics  these  days  or  mathematics  or  engineering. 

The  second  thing  is  a  certain  amount  of  intelligence  or 
you  have 

talent.    In  other  words, /to  have  an  IQ  that  is  fairly  high  to  do 
good  research  work.  I  don't  think  that  there  is  any  good  engineer 
or  scientist  that  can  get  along  on  an  IQ  of  100,  which  is  the 
average  for  human  beings.    In  other  words,  he  has  to  have  an  IQ 
higher  than  that.    Everyone  in  this  room  is  considerably  above 
that.    This,  we  might  say,  is  a  matter  of  environment;  intelligence 
ie  a  matter  of  heredity. 

Those  two  I  don't  think  are  sufficient.    I  think  there  is 
a  third  constituent  here,  a  third  component  which  is  the  one  that 
makes  an  Einstein  or  an  Isaac  Newton.    For  want  of  a  better  word, 
we  will  call  it  motivation.    In  other  words,  you  have  to  have  some 
kind  of  a  drive,  some  kind  of  a  desire  to  find  out  the  answer,  a 
desire  to  find  out  what  makes  things  tick.    If  you  don't  have  that, 
you  may  have  all  the  training  and  intelligence  in  the  world,  you 
don't  have  questions  and  you  won't  just  find  answers.    This  is  a 
hard  thing  to  put  your  finger  on.    It  is  a  matter  of  temperament 

3  - 

probably;  that  is,  a  matter  of  probably  early  training,  early  child- 
hood experiences,  whether  you  will  motivate  in  the  direction  of  scien- 
tific research.    I  think  that  at  a  superficial  level,  it  is  blended 
use  of  several  things.    This  is  not  any  attempt  at  a  deep  analysis  at 
all,  but  my  feeling  is  that  a  good  scientist  has  a  great  deal  of  what 
we  can  call  curiosity.    I  won't  go  any  deeper  into  it  than  that.  He 

wants  to  know  the  answers.    He's  just  curious  how  things  tick  and  he 


wants  to  know  the  answers  to  questions;  and  if/sees  things,  he  wants 
to  raise  questions  and  he  wants  to  know  the  answers  to  those 0 

Then  there's  the  idea  of  dissatisfaction.    By  this  I  don't 
mean  a  pessimistic  dissatisfaction  of  the  world  -  we  don't  like  the 
way  things  are  -  I  mean  a  constructive  dissatisfaction.    The  idea 
could  be  expressed  in  the  words,  "This  is  OK,  but  I  think  things  could 
be  done  better.    I  think  there  is  a  neater  way  to  do  this.    I  think 
things  could  be  improved  a  little. w    In  other  words,  there  is  con- 
tinually a  slight  irritation  when  things  don't  look  quite  right}  and 
I  think  that  dissatisfaction  in  present  days  is  a  key  driving  force 
in  good  scientists. 

And  another  thing  I'd  put  down  here  is  the  pleasure  in  see- 
ing net  results  or  methods  of  arriving  at  results  needed,  designs  of 
engineers,  equipment,  and  so  on.    I  get  a  big  bang  myself  out  of  proving 
a  theorem.    If  I've  been  trying  to  prove  a  mathematical  theorem  for 
a  week  or  so  and  I  finally  find  the  solution,  I  get  a  big  bang  out  of 
it.    And  I  get  a  big  kick  out  of  seeing  a  clever  way  of  doing  some 

engineering  problem,  a  clever  design  for  a  circuit  which  uses  a  very 
small  amount  of  equipment  and  gets  apparently  a  great  deal  of  result 
out  of  it.    I  think  so  far  as  motivation  is  concerned,  it  is  maybe  a 

little  like  Fats  Waller  said  about  swing  music  -  either  you  got  it  or 


you  ain't.    If  you  ain't  got  it,  you  probably  shouldn't  be  doing  re- 
search work  if  you  don'